eKonometrics: Comparing mnlogit and mlogit for discrete choice models

Earlier this weekend (Dec. 7, 2013), mnlogit was released on CRAN by Wang Zhiyu and Asad Hasan (asad.hasan@sentrana.com) claiming that mnlogit uses “parallel C++ library to achieve fast computation of Hessian matrices”.

Here is a comparison of mnlogit with mlogit by Yves Croissant whose package seems to be the inspiration for mnlogit.

I will estimate the same model using the same data set and will compare the two packages for execution speed, specification flexibility, and ease of use.

Data set

I use the Fish data set to estimate mnlogit and mlogit. mnlogit defines the data set as follows:

A data frame containing :

mode - The choice set: beach, pier, boat, and charter
price - price for a mode for an individual
catch - fish catch rate for a mode for an individual
income - monthly income of the individual decision-maker
chid - decision maker ID

The authors mention that they have sourced the data from R package mlogit by Yves Croissant, which lists the source as:

Herriges, J. A. and C. L. Kling (1999) “Nonlinear Income Effects in Random Utility Models”, Review of Economics and Statistics, 81, 62-72.

Estimation with mnlogit

library(mnlogit)

## Warning: package 'mnlogit' was built under R version 3.0.2

## Package: mnlogit Version: 1.0 Multinomial Logit Choice Models. Scientific
## Computing Group, Sentrana Inc, 2013.


data(Fish, package = "mnlogit")
fm <- formula(mode ~ 1 | income | price + catch)
summary(mnlogit(fm, Fish, "alt"))

## 
## Call:
## mnlogit(formula = fm, data = Fish, choiceVar = "alt")
## 
## Frequencies of alternatives in input data:
##   beach    boat charter    pier 
##   0.113   0.354   0.382   0.151 
## 
## Number of observations in training data = 1182
## Number of alternatives = 4
## Intercept turned: ON.
## Number of parameters in model = 14
##   # individual specific variables = 2
##   # choice specific coeff variables = 2
##   # generic coeff variables = 0
## 
## Maximum likelihood estimation using Newton-Raphson iterations.
##   Number of iterations: 7
##   Number of linesearch iterations: 7
## At termination: 
##   Gradient 2-norm = 8.19688645245397e-05
##   Diff between last 2 loglik values = 4.15543581766542e-08
##   Stopping reason: Succesive loglik difference < ftol (1e-06).
## Total estimation time (sec): 0.04
## Time for Hessian calculations (sec): 0.04 using 1 processors.
## 
## Coefficients : 
##                      Estimate Std.Error t-value Pr(>|t|)    
## (Intercept):boat     8.64e-01  3.15e-01    2.74  0.00607 ** 
## (Intercept):charter  1.85e+00  3.10e-01    5.97  2.4e-09 ***
## (Intercept):pier     1.13e+00  3.05e-01    3.71  0.00021 ***
## income:boat         -1.10e-04  6.02e-05   -1.84  0.06636 .  
## income:charter      -2.78e-04  6.03e-05   -4.61  4.0e-06 ***
## income:pier         -1.28e-04  5.33e-05   -2.41  0.01605 *  
## price:beach         -3.80e-02  3.33e-03  -11.41  < 2e-16 ***
## price:boat          -2.09e-02  2.23e-03   -9.33  < 2e-16 ***
## price:charter       -1.60e-02  2.02e-03   -7.94  2.0e-15 ***
## price:pier          -3.92e-02  3.26e-03  -12.02  < 2e-16 ***
## catch:beach          4.95e+00  8.20e-01    6.04  1.5e-09 ***
## catch:boat           2.47e+00  5.19e-01    4.76  1.9e-06 ***
## catch:charter        7.61e-01  1.52e-01    4.99  6.0e-07 ***
## catch:pier           4.88e+00  8.99e-01    5.43  5.5e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -1160, df = 14
## AIC:  13.8875709217033

system.time((mnlogit(fm, Fish, "alt")))

##    user  system elapsed 
##    0.11    0.00    0.11

Estimation with mlogit

I estimate the same model using mlogit.

library(mlogit)

## Loading required package: Formula Loading required package: statmod
## Loading required package: lmtest Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following object is masked from 'package:base':
## 
## as.Date, as.Date.numeric
## 
## Loading required package: maxLik Loading required package: miscTools
## Loading required package: MASS


data2 <- mlogit.data(Fish, choice = "alt", shape = "long", id.var = "chid", 
    alt.levels = c("beach", "boat", "charter", "pier"))
summary(mod1 <- mlogit(mode ~ 1 | income | price + catch, data2, reflevel = "beach"))

## 
## Call:
## mlogit(formula = mode ~ 1 | income | price + catch, data = data2, 
##     reflevel = "beach", method = "nr", print.level = 0)
## 
## Frequencies of alternatives:
##   beach    boat charter    pier 
##   0.113   0.354   0.382   0.151 
## 
## nr method
## 7 iterations, 0h:0m:0s 
## g'(-H)^-1g = 8.31E-08 
## gradient close to zero 
## 
## Coefficients :
##                      Estimate Std. Error t-value Pr(>|t|)    
## boat:(intercept)     8.64e-01   3.15e-01    2.74  0.00607 ** 
## charter:(intercept)  1.85e+00   3.10e-01    5.97  2.4e-09 ***
## pier:(intercept)     1.13e+00   3.05e-01    3.71  0.00021 ***
## boat:income         -1.10e-04   6.02e-05   -1.84  0.06636 .  
## charter:income      -2.78e-04   6.03e-05   -4.61  4.0e-06 ***
## pier:income         -1.28e-04   5.33e-05   -2.41  0.01605 *  
## beach:price         -3.80e-02   3.33e-03  -11.41  < 2e-16 ***
## boat:price          -2.09e-02   2.23e-03   -9.33  < 2e-16 ***
## charter:price       -1.60e-02   2.02e-03   -7.94  2.0e-15 ***
## pier:price          -3.92e-02   3.26e-03  -12.02  < 2e-16 ***
## beach:catch          4.95e+00   8.20e-01    6.04  1.5e-09 ***
## boat:catch           2.47e+00   5.19e-01    4.76  1.9e-06 ***
## charter:catch        7.61e-01   1.52e-01    4.99  6.0e-07 ***
## pier:catch           4.88e+00   8.99e-01    5.43  5.5e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -1160
## McFadden R^2:  0.225 
## Likelihood ratio test : chisq = 675 (p.value = <2e-16)

system.time(mlogit(mode ~ 1 | income | price + catch, data2, reflevel = "beach"))

##    user  system elapsed 
##    0.27    0.02    0.28

Speed Tests

I conducted a simple test for execution times with the command system.time. The results are reported above after each model summary.

Findings

I obtain identical results for the models estimated with mnlogit and mlogit.

Estimation speeds appear faster for mnlogit.

caveat: The same command (system.time) when run independently of the R Markdown environment shows mlogit to be faster than mnlogit!

Additional Comments: I restarted RStudio and estimated the two models again outside of R Markdown. mnlogit took 0.12 seconds versus 0.31 seconds for mlogit.

Verdict: mnlogit reports shorter execution times than mlogit.

Also, Estimation speeds may differ with large and complex data sets.

The model specification is simpler in mnlogit.

The commands for specifying the data set and the model seem easier in mnlogit.

mlogit syntax seems relatively complex, but offers more choices in model specification.

Final comment

I am delighted to see yet another package for discrete choice models. As the field evolves and more advanced problems are subjected to discrete choice models, the need for robust and computationally efficient packages will be felt more. Both mlogit and mnlogit are indeed valuable to those interested in how humans make good and bad choices!

1 comment:

Asad HasanDecember 8, 2013 at 3:18 PM
Hi Murtaza, I just noticed your blogpost mentioning mnlogit and mlogit. Let me thank you for checking out and writing about our package mnlogit.

I'd like to make a point on a comment in your caveat (point 2 of Section Findings) that outside the R Markdown environment mlogit was faster. Our tests have shown mnlogit outperforms mlogit by a factor proportional to K (the number of choices) - while run on a Linux command line with 'R CMD BATCH --no-save'. Please check out Table 3 in the section on "performance benchmarking" in our vignette. Our speed testing process is demonstrated in Appendix C of the vignette. You can run the code, or run the Sweave file, on your local machine.

Pages

Sunday, December 8, 2013

Comparing mnlogit and mlogit for discrete choice models