Sunday, December 8, 2013

Summarize statistics by Groups in R & R Commander

R is great at accomplishing complex tasks. Doing simple things with R though takes some effort. Consider the simple task of producing summary statistics for continuous variables over some factor variables. Using Stata, I’d write a brief one-liner to get the mean for one or more variables using another variable as a factor. For instance, tabstat Horsepower RPM, by(Type)  in Stata produces the following:

image

The doBy package in R offers similar functionality and more. Of particular interest for those who teach R based statistics courses in the undergraduate programs is the doBy plugin for R Commander. The plugin was developed by Jonathan Lee and it is a great tool for teaching and for quick data analysis. To get the same output as the one listed above, I’d click on the doBy plugin to get the following dialogue box:

image

The dialogue box results in the following simple syntax:

summaryBy(Horsepower+RPM~Type, data=Cars93, FUN=c(mean))

You may first have to load the data set:
data(Cars93, package="MASS")

And the results are presented below:

image

Jonathan has also created GUIs for order by, sample by, and split by within the same plug-in. A must use plug-in for data scientists.

Comparing mnlogit and mlogit for discrete choice models

Earlier this weekend (Dec. 7, 2013), mnlogit was released on CRAN by Wang Zhiyu and Asad Hasan (asad.hasan@sentrana.com) claiming that mnlogit uses “parallel C++ library to achieve fast computation of Hessian matrices”.

Here is a comparison of mnlogit with mlogit by Yves Croissant whose package seems to be the inspiration for mnlogit.

I will estimate the same model using the same data set and will compare the two packages for execution speed, specification flexibility, and ease of use.

Data set

I use the Fish data set to estimate mnlogit and mlogit. mnlogit defines the data set as follows:

A data frame containing :

  • mode - The choice set: beach, pier, boat, and charter
  • price - price for a mode for an individual
  • catch - fish catch rate for a mode for an individual
  • income - monthly income of the individual decision-maker
  • chid - decision maker ID

The authors mention that they have sourced the data from R package mlogit by Yves Croissant, which lists the source as:

  • Herriges, J. A. and C. L. Kling (1999) “Nonlinear Income Effects in Random Utility Models”, Review of Economics and Statistics, 81, 62-72.

Estimation with mnlogit

library(mnlogit)


## Warning: package 'mnlogit' was built under R version 3.0.2


## Package: mnlogit Version: 1.0 Multinomial Logit Choice Models. Scientific
## Computing Group, Sentrana Inc, 2013.



data(Fish, package = "mnlogit")
fm <- formula(mode ~ 1 | income | price + catch)
summary(mnlogit(fm, Fish, "alt"))


## 
## Call:
## mnlogit(formula = fm, data = Fish, choiceVar = "alt")
##
## Frequencies of alternatives in input data:
## beach boat charter pier
## 0.113 0.354 0.382 0.151
##
## Number of observations in training data = 1182
## Number of alternatives = 4
## Intercept turned: ON.
## Number of parameters in model = 14
## # individual specific variables = 2
## # choice specific coeff variables = 2
## # generic coeff variables = 0
##
## Maximum likelihood estimation using Newton-Raphson iterations.
## Number of iterations: 7
## Number of linesearch iterations: 7
## At termination:
## Gradient 2-norm = 8.19688645245397e-05
## Diff between last 2 loglik values = 4.15543581766542e-08
## Stopping reason: Succesive loglik difference < ftol (1e-06).
## Total estimation time (sec): 0.04
## Time for Hessian calculations (sec): 0.04 using 1 processors.
##
## Coefficients :
## Estimate Std.Error t-value Pr(>|t|)
## (Intercept):boat 8.64e-01 3.15e-01 2.74 0.00607 **
## (Intercept):charter 1.85e+00 3.10e-01 5.97 2.4e-09 ***
## (Intercept):pier 1.13e+00 3.05e-01 3.71 0.00021 ***
## income:boat -1.10e-04 6.02e-05 -1.84 0.06636 .
## income:charter -2.78e-04 6.03e-05 -4.61 4.0e-06 ***
## income:pier -1.28e-04 5.33e-05 -2.41 0.01605 *
## price:beach -3.80e-02 3.33e-03 -11.41 < 2e-16 ***
## price:boat -2.09e-02 2.23e-03 -9.33 < 2e-16 ***
## price:charter -1.60e-02 2.02e-03 -7.94 2.0e-15 ***
## price:pier -3.92e-02 3.26e-03 -12.02 < 2e-16 ***
## catch:beach 4.95e+00 8.20e-01 6.04 1.5e-09 ***
## catch:boat 2.47e+00 5.19e-01 4.76 1.9e-06 ***
## catch:charter 7.61e-01 1.52e-01 4.99 6.0e-07 ***
## catch:pier 4.88e+00 8.99e-01 5.43 5.5e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -1160, df = 14
## AIC: 13.8875709217033


system.time((mnlogit(fm, Fish, "alt")))


##    user  system elapsed 
## 0.11 0.00 0.11


Estimation with mlogit



I estimate the same model using mlogit.



library(mlogit)


## Loading required package: Formula Loading required package: statmod
## Loading required package: lmtest Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following object is masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Loading required package: maxLik Loading required package: miscTools
## Loading required package: MASS



data2 <- mlogit.data(Fish, choice = "alt", shape = "long", id.var = "chid"
,
alt.levels
= c("beach", "boat", "charter", "pier"))
summary(mod1 <- mlogit(mode ~ 1 | income | price + catch, data2, reflevel = "beach"))


## 
## Call:
## mlogit(formula = mode ~ 1 | income | price + catch, data = data2,
## reflevel = "beach", method = "nr", print.level = 0)
##
## Frequencies of alternatives:
## beach boat charter pier
## 0.113 0.354 0.382 0.151
##
## nr method
## 7 iterations, 0h:0m:0s
## g'(-H)^-1g = 8.31E-08
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error t-value Pr(>|t|)
## boat:(intercept) 8.64e-01 3.15e-01 2.74 0.00607 **
## charter:(intercept) 1.85e+00 3.10e-01 5.97 2.4e-09 ***
## pier:(intercept) 1.13e+00 3.05e-01 3.71 0.00021 ***
## boat:income -1.10e-04 6.02e-05 -1.84 0.06636 .
## charter:income -2.78e-04 6.03e-05 -4.61 4.0e-06 ***
## pier:income -1.28e-04 5.33e-05 -2.41 0.01605 *
## beach:price -3.80e-02 3.33e-03 -11.41 < 2e-16 ***
## boat:price -2.09e-02 2.23e-03 -9.33 < 2e-16 ***
## charter:price -1.60e-02 2.02e-03 -7.94 2.0e-15 ***
## pier:price -3.92e-02 3.26e-03 -12.02 < 2e-16 ***
## beach:catch 4.95e+00 8.20e-01 6.04 1.5e-09 ***
## boat:catch 2.47e+00 5.19e-01 4.76 1.9e-06 ***
## charter:catch 7.61e-01 1.52e-01 4.99 6.0e-07 ***
## pier:catch 4.88e+00 8.99e-01 5.43 5.5e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -1160
## McFadden R^2: 0.225
## Likelihood ratio test : chisq = 675 (p.value = <2e-16)


system.time(mlogit(mode ~ 1 | income | price + catch, data2, reflevel = "beach"))


##    user  system elapsed 
## 0.27 0.02 0.28


Speed Tests



I conducted a simple test for execution times with the command system.time. The results are reported above after each model summary.



Findings



  • I obtain identical results for the models estimated with mnlogit and mlogit.


  • Estimation speeds appear faster for mnlogit.


  • caveat: The same command (system.time) when run independently of the R Markdown environment shows mlogit to be faster than mnlogit!


  • Additional Comments: I restarted RStudio and estimated the two models again outside of R Markdown. mnlogit took  0.12 seconds versus 0.31 seconds for mlogit.


  • Verdict: mnlogit reports shorter execution times than mlogit.


  • Also, Estimation speeds may differ with large and complex data sets.


  • The model specification is simpler in mnlogit.


  • The commands for specifying the data set and the model seem easier in mnlogit.


  • mlogit syntax seems relatively complex, but offers more choices in model specification.

    Final comment



    I am delighted to see yet another package for discrete choice models. As the field evolves and more advanced problems are subjected to discrete choice models, the need for robust and computationally efficient packages will be felt more. Both mlogit and mnlogit are indeed valuable to those interested in how humans make good and bad choices!




  • Wednesday, May 22, 2013

    What happened to six million voters?

    The recent elections in Pakistan on May 11 were a great success by all means. In spite of the threats for violence by Al-Qaeda and its local franchises in Pakistan against those who would vote, millions of Pakistanis indeed stepped out to vote for an elected government. The Election Commission of Pakistan (ECP) claimed a voter turnout of 60%.

    One would have hoped to see 50.5 million votes polled for a 60% turnout by the 84.2 million registered voters in the 262 ridings of the National Assembly for which the ECP reported results. However, ECP’s own data reported 44.9 million votes, resulting in a gap of app. 5.7 million votes. The actual turnout thus was close to 53%.

    image

    I used R to siphon off data for 262 ridings, which ECP reported on separate web pages. The R code is presented below.

    library(XML)

    # Get the URL prefix
    u1<-"http://www.ecp.gov.pk/electionresult/Search.aspx?constituency=NA&constituencyid=NA-"

    # loop through the 272 ridings
    for (i in 1:272) {
     
      #get the riding number
      u2<- i
     
      #complete the URL Address
      url2=paste(u1,u2,sep="")
     
      #Read the table
      ridedata=readHTMLTable(url2, header=T, which=8,stringsAsFactors=F)
     
      #Read the HTML page
      web_page <- readLines(url2)
     
      # Pull out the appropriate line with the riding name using the identifier "specialheading"
      ridename <- web_page[grep("Specialheading", web_page)]
     
      #get the starting integer for the riding name
      startx=regexpr("(", ridename, fixed=TRUE)
      startx=startx[1]+1
     
      #get the last digit for the riding name
      endx=regexpr("<span", ridename)
      endx=endx[1]-2
     
      #Generate the riding name
      ridename=substr(ridename,startx,endx)
     
      #merge data in one table
      assign(paste0("fname",u2, sep=""), cbind(ridedata,riding=i,rname=ridename))
    }

    I used a simple rbind command to assemble data in one large file after storing  individual riding data first in separate files. This was done because the server timed out several times during the execution, and it allowed me to restart from the riding where the system failed, rather than starting from the beginning every time.

    Tuesday, April 9, 2013

    Second edition of Crawley's The R Book

    The second edition of Michael Cawley's The R Book is available from Wiley. According to the publisher, the new edition boasts the following features:

    • "Features full colour text and extensive graphics throughout.
    • Introduces a clear structure with numbered section headings to help readers locate information more efficiently.
    • Looks at the evolution of R over the past five years.
    • Features a new chapter on Bayesian Analysis and Meta-Analysis.
    • Presents a fully revised and updated bibliography and reference section.
    • Is supported by an accompanying website allowing examples from the text to be run by the user."

    At 1076 pages, this continues to be the most comprehensive text on R.

    Wednesday, February 27, 2013

    Workshops on Modelling Choices using R in Toronto

    Making choices is inherently human. We choose between brands of cereal or amongst candidates in an election. At times, choices may be influenced by the characteristics of the decision maker, such as age, income and sex. Choices may also be influenced by the attributes of competing alternatives, such as the cost of travelling between two cities by air or rail. At other times, choices are influenced by both.

    Analyzing choices can be tricky. Practitioners and researchers have developed numerous statistical techniques to analyze and model choices. This workshop will offer applied, hands-on training in analyzing choices.

    The workshops will be offered in two sessions. First session will focus on binary (yes/no) choices and introduce the basic assumptions about choice analysis. It will provide hands-on training on exploratory data analysis. Second session will focus on advanced topics in choice modelling including multiple (multinomial) choices, elasticities, and estimating market shares.

    Participants are expected to bring their own laptops. Basic concepts will be illustrated in SPSS, Stata, and R.

    Title: Workshop on Modelling Choices

    Dates and Time: Session One - Friday, March 22, 2013 (2pm-5pm)

    Session Two - Friday, March 29, 2013 (2pm-5pm)

    Instructor: Murtaza Haider, Ph.D.

    Location: Ted Rogers School of Management, Ryerson University, 55 Dundas Street West, Room 3-119, Toronto M5G 2C3 

    Registration fee: The workshop is sponsored by the Dean’s office at the Ted Rogers School of Management and is offered free-of-cost to the Ryerson community.

    Please RSVP by emailing mba@ryerson.ca

    Registration will be restricted to 25 participants.

    Monday, February 4, 2013

    Help needed with sample selection biases

    We are searching for a graduate student to assist us on a very short assignment about sample selection biases and Heckman Probit models. The help is not needed for estimating the models, but instead for reviewing the scenarios where the use of such models is theoretically appropriate or otherwise. For instance, we are particularly interested in determining if Heck Probit type models could be applied in situations where the response variable had the don’t know/refused option, which has been used for the selection equation in some published research. We seek help in understanding the assumptions in the model that would permit or restrict the use of Heck Probit model in such circumstances.

    If interested, please email Murtaza Haider at murtaza.haider@ryerson.ca.

    Monday, January 21, 2013

    We MUST, says the President

    In his inaugural address today the American president told his fellow countrymen that they “MUST”. The word cloud generated by the text of his speech today highlights the following keywords: must, people, time, and journey.  It reads like a work order for Americans, so they “MUST” get to work.

    image

    The text of the speech is reproduced below.

    “Vice-President Biden, Mr. Chief Justice, Members of the United States Congress, distinguished guests, and fellow citizens:

    Each time we gather to inaugurate a president, we bear witness to the enduring strength of our Constitution. We affirm the promise of our democracy. We recall that what binds this nation together is not the colors of our skin or the tenets of our faith or the origins of our names. What makes us exceptional – what makes us American – is our allegiance to an idea, articulated in a declaration made more than two centuries ago:

    “We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable rights, that among these are Life, Liberty, and the pursuit of Happiness.”

    Today we continue a never-ending journey, to bridge the meaning of those words with the realities of our time. For history tells us that while these truths may be self-evident, they have never been self-executing; that while freedom is a gift from God, it must be secured by His people here on Earth. The patriots of 1776 did not fight to replace the tyranny of a king with the privileges of a few or the rule of a mob. They gave to us a Republic, a government of, and by, and for the people, entrusting each generation to keep safe our founding creed.

    For more than two hundred years, we have.

    Through blood drawn by lash and blood drawn by sword, we learned that no union founded on the principles of liberty and equality could survive half-slave and half-free. We made ourselves anew, and vowed to move forward together.

    Together, we determined that a modern economy requires railroads and highways to speed travel and commerce; schools and colleges to train our workers.

    Together, we discovered that a free market only thrives when there are rules to ensure competition and fair play.

    Together, we resolved that a great nation must care for the vulnerable, and protect its people from life’s worst hazards and misfortune.

    Through it all, we have never relinquished our skepticism of central authority, nor have we succumbed to the fiction that all society’s ills can be cured through government alone. Our celebration of initiative and enterprise; our insistence on hard work and personal responsibility, are constants in our character.

    But we have always understood that when times change, so must we; that fidelity to our founding principles requires new responses to new challenges; that preserving our individual freedoms ultimately requires collective action. For the American people can no more meet the demands of today’s world by acting alone than American soldiers could have met the forces of fascism or communism with muskets and militias. No single person can train all the math and science teachers we’ll need to equip our children for the future, or build the roads and networks and research labs that will bring new jobs and businesses to our shores. Now, more than ever, we must do these things together, as one nation, and one people.

    This generation of Americans has been tested by crises that steeled our resolve and proved our resilience. A decade of war is now ending. An economic recovery has begun. America’s possibilities are limitless, for we possess all the qualities that this world without boundaries demands: youth and drive; diversity and openness; an endless capacity for risk and a gift for reinvention. My fellow Americans, we are made for this moment, and we will seize it – so long as we seize it together.

    For we, the people, understand that our country cannot succeed when a shrinking few do very well and a growing many barely make it. We believe that America’s prosperity must rest upon the broad shoulders of a rising middle class. We know that America thrives when every person can find independence and pride in their work; when the wages of honest labor liberate families from the brink of hardship. We are true to our creed when a little girl born into the bleakest poverty knows that she has the same chance to succeed as anybody else, because she is an American, she is free, and she is equal, not just in the eyes of God but also in our own.

     
    page 2
    We understand that outworn programs are inadequate to the needs of our time. We must harness new ideas and technology to remake our government, revamp our tax code, reform our schools, and empower our citizens with the skills they need to work harder, learn more, and reach higher. But while the means will change, our purpose endures: a nation that rewards the effort and determination of every single American. That is what this moment requires. That is what will give real meaning to our creed.

    We, the people, still believe that every citizen deserves a basic measure of security and dignity. We must make the hard choices to reduce the cost of health care and the size of our deficit. But we reject the belief that America must choose between caring for the generation that built this country and investing in the generation that will build its future. For we remember the lessons of our past, when twilight years were spent in poverty, and parents of a child with a disability had nowhere to turn. We do not believe that in this country, freedom is reserved for the lucky, or happiness for the few. We recognize that no matter how responsibly we live our lives, any one of us, at any time, may face a job loss, or a sudden illness, or a home swept away in a terrible storm. The commitments we make to each other – through Medicare, and Medicaid, and Social Security – these things do not sap our initiative; they strengthen us. They do not make us a nation of takers; they free us to take the risks that make this country great.

    We, the people, still believe that our obligations as Americans are not just to ourselves, but to all posterity. We will respond to the threat of climate change, knowing that the failure to do so would betray our children and future generations. Some may still deny the overwhelming judgment of science, but none can avoid the devastating impact of raging fires, and crippling drought, and more powerful storms. The path towards sustainable energy sources will be long and sometimes difficult. But America cannot resist this transition; we must lead it. We cannot cede to other nations the technology that will power new jobs and new industries – we must claim its promise. That is how we will maintain our economic vitality and our national treasure – our forests and waterways; our croplands and snowcapped peaks. That is how we will preserve our planet, commanded to our care by God. That’s what will lend meaning to the creed our fathers once declared.

    We, the people, still believe that enduring security and lasting peace do not require perpetual war. Our brave men and women in uniform, tempered by the flames of battle, are unmatched in skill and courage. Our citizens, seared by the memory of those we have lost, know too well the price that is paid for liberty. The knowledge of their sacrifice will keep us forever vigilant against those who would do us harm. But we are also heirs to those who won the peace and not just the war, who turned sworn enemies into the surest of friends, and we must carry those lessons into this time as well.

    We will defend our people and uphold our values through strength of arms and rule of law. We will show the courage to try and resolve our differences with other nations peacefully – not because we are naïve about the dangers we face, but because engagement can more durably lift suspicion and fear. America will remain the anchor of strong alliances in every corner of the globe; and we will renew those institutions that extend our capacity to manage crisis abroad, for no one has a greater stake in a peaceful world than its most powerful nation. We will support democracy from Asia to Africa; from the Americas to the Middle East, because our interests and our conscience compel us to act on behalf of those who long for freedom. And we must be a source of hope to the poor, the sick, the marginalized, the victims of prejudice – not out of mere charity, but because peace in our time requires the constant advance of those principles that our common creed describes: tolerance and1/21/13 The Text of Obama’s Inaugural Address – Yahoo! News news.yahoo.com/text-obamas-inaugural-address-165950611.html 3/3 opportunity; human dignity and justice.

     
    page 3
    We, the people, declare today that the most evident of truths – that all of us are created equal – is the star that guides us still; just as it guided our forebears through Seneca Falls, and Selma, and Stonewall; just as it guided all those men and women, sung and unsung, who left footprints along this great Mall, to hear a preacher say that we cannot walk alone; to hear a King proclaim that our individual freedom is inextricably bound to the freedom of every soul on Earth.

    It is now our generation’s task to carry on what those pioneers began. For our journey is not complete until our wives, our mothers, and daughters can earn a living equal to their efforts. Our journey is not complete until our gay brothers and sisters are treated like anyone else under the law – for if we are truly created equal, then surely the love we commit to one another must be equal as well. Our journey is not complete until no citizen is forced to wait for hours to exercise the right to vote.

    Our journey is not complete until we find a better way to welcome the striving, hopeful immigrants who still see America as a land of opportunity; until bright young students and engineers are enlisted in our workforce rather than expelled from our country. Our journey is not complete until all our children, from the streets of Detroit to the hills of Appalachia to the quiet lanes of Newtown, know that they are cared for, and cherished, and always safe from harm.

    That is our generation’s task – to make these words, these rights, these values – of Life, and Liberty, and the Pursuit of Happiness – real for every American. Being true to our founding documents does not require us to agree on every contour of life; it does not mean we will all define liberty in exactly the same way, or follow the same precise path to happiness.

    Progress does not compel us to settle centuries-long debates about the role of government for all time – but it does require us to act in our time.

    For now decisions are upon us, and we cannot afford delay. We cannot mistake absolutism for principle, or substitute spectacle for politics, or treat name-calling as reasoned debate. We must act, knowing that our work will be imperfect. We must act, knowing that today’s victories will be only partial, and that it will be up to those who stand here in four years, and forty years, and four hundred years hence to advance the timeless spirit once conferred to us in a spare Philadelphia hall.

    My fellow Americans, the oath I have sworn before you today, like the one recited by others who serve in this Capitol, was an oath to God and country, not party or faction – and we must faithfully execute that pledge during the duration of our service. But the words I spoke today are not so different from the oath that is taken each time a soldier signs up for duty, or an immigrant realizes her dream. My oath is not so different from the pledge we all make to the flag that waves above and that fills our hearts with pride.

    They are the words of citizens, and they represent our greatest hope.

    You and I, as citizens, have the power to set this country’s course.

    You and I, as citizens, have the obligation to shape the debates of our time – not only with the votes we cast, but with the voices we lift in defense of our most ancient values and enduring ideals.

    Let each of us now embrace, with solemn duty and awesome joy, what is our lasting birthright. With common effort and common purpose, with passion and dedication, let us answer the call of history, and carry into an uncertain future that precious light of freedom.

    Thank you, God Bless you, and may He forever bless these United States of America.”