Sunday, April 17, 2011

Going over the speed limit

In an earlier post [Speeding tickets for R and Stata]  I had reported on how R compared with Stata for executing algorithms involving maximum likelihood estimation. This post  offers the following updates on the last post:

  • Stata is in fact even faster than previously reported.
  • The 64-bit version of the newly released R 2.13.0 reports faster times than the 32-bit version for R 2.12.0.
  • Limdep/NLogit, a popular econometrics software amongst discrete choice modellers, reported slower execution times than R 2.13 (64-bit version).
  • Advice on speed from experienced R users.

    My data set (used for the test results reported below) comprised an ordinal dependant variable [5 categories] and categorical explanatory variables with 63,122 observations. I used a computer running Windows 7 Professional on Intel Core 2 Quad CPU Q9300 @ 2.5 GHz with 8 GB of RAM. Further details about the tests are listed in the following Table.

    Software Routines

    Stata 11 (duo core)

    R (2.12.0) [32-bit]

    R x64 2.13.0


    Commercial license price

    US$2,495 Free Free $1,395

    Multinomial Logit

    mlogit, 9.06 seconds 
       (2.89 seconds with 
       the “quietly” option")
    multinom, 50.59 sec + 52.29 sec
    zelig (mlogit), 77.89 sec
    VGLM (multinomial), 64.4 sec
    multinom, 32.7 sec + 49.8 sec
    zelig (mlogit), 69.92 sec
    VGLM (multinomial), 63.76 sec
    Logit; 36.72 sec

    Proportional odds model

    ologit, 1.69 sec
              0.91 sec [quietly]
    oprobit, 0.91 sec [quietly]
    VGLM (parallel = T), 16.26 sec
    polr, 22.62 sec [o.logit]
    VGLM (parallel = T), 14.94 sec
    polr, 13.49 sec [o.logit]
    polr, 14.94 sec [o.probit]
    Ordered [Logit] 18.50 sec
    Ordered [Probit] 36.33 sec

    Generalized Logit

    gologit2, 18.67 sec
    (15.1 seconds with 
       the “quietly” option")
    VGLM (parallel = F), 64.71 sec
    VGLM (parallel = F),  64.86 sec


    Stata is even faster

    When I reran the models using the quietly option (which supresses terminal output ) in Stata, I obtained the actual algorithm convergence times. For the multinomial logit model, Stata took fewer than 3 seconds to converge, making it 10-times faster than R. Similar reductions in execution times for Stata were observed for other algorithms reported in the table above.

    64-bit version of R is faster, sometimes

    The 64-bit version of R (2.13.0) reported faster execution times. The same was observed for the 64-bit version of R (2.12.0). Notice in the table above the dramatic reduction in the convergence times for the multinomial logit model (using multinom). R 2.13.0 [64-bit] took 35.4% less time to converge than R 2.12.0 [32-bit]. However, Zelig and VGLM based algorithms reported very modest improvements in execution times.

    The ordered logit and ordered probit models (executed using the polr algorithm) also reported significant improvements in execution times.The ordered logit model took 40.3% less time in converging for R 2.13.0 [64-bit] than R 2.12.0 [32-bit].

    I still do not understand why the summary(multinomial logit model) still takes an additional 49.8 seconds on top of 32.7 seconds to report summary results for the multinomial logit model. When I do not use summary() and instead use coef(multinomial logit model), I get instantaneous output.

    In summary, it appears that not all algorithms would converge faster in the updated 64-bit version of R 2.13.0.

    R is faster than Limdep/NLogit

    In comparison, R [2.13.0] offered faster convergence times than NLogit for multinomial and ordered logit models and for ordered probit models. This puts R in the middle of two popular econometrics software. Stata is significantly faster than R, and R offers faster execution times than NLogit (see the difference for ordered logit in the table above).

    What R Pros are saying about my post

    If you were to scroll down to the comments section of my last post [Speeding tickets for R and Stata], you’ll notice some advice from experienced users of R. I have been advised to re-run the tests by first obtaining the optimised version of BLAS and LAPACK libraries.  I am not sure how much difference would that make. However, it would be a little difficult for ordinary users of R (such as myself) to be able to determine what BLAS and LAPACK libraries to choose and install that are appropriate for their computer systems.

    If significant speed gains could be achieved by using optimised BLAS and LAPACK libraries, the R installation routines may then be improved so that these libraries are made available to the novice end users of R.


    1. Thanks for this. I'm in the process of working with an ordinal multinomial logit model and might have to convert to STATA just for this once given your analysis. I've found STATA to be superior for ML in the past as well, but did not expect it to be nearly as large as you cite here (unfortunately, I only have STATA on my sub-par office computer). On 8 GB RAM with Intel Core i5, I was looking at run times in the 5 to 10 minute range, and would sometimes hit the maximum RAM capabilities thanks to the large dataset saved in active memory.

      I have over 1 million observations and a number of factor variables in the model, so computing time is at a premium for this one.

      A quick question: has anyone attempted to use ReadyBoost in Windows to improve the data handling capabilities of R? Would that be a reasonable thing to try with a dataset taking up this much memory? Windows 7 allows for up to 256 GB of additional ReadyBoost memory--though I'm not sure how that would translate to improved RAM capabilities using R.

    2. Summary in R calculates a lot of additional statistics, that is why it is slow. Getting the coefficients does not involve any calculation, that is why it is instantaneous.

      Optimised multithreaded BLAS/LAPACK might get you a significant savings, since probably Stata is using them too. If you use them, more cores can be used without any additional cost, which is probably not the case with Stata.

      In general speed is costly. And the dependence is exponential, so to get 50% of speed-up might cost twice as much. By cost I assume not only money, but also other work. My rule of thumb for optimising is if the code runs in 1 minute, I do not do it. I see that your results falls nicely into this.

      It would have been nice if you provided the code which you used to get the simulation.