Comments on eKonometrics: R: simple for complex tasks, complex for simple tasks

So there have been some comments about GUI-based o...

2018-03-12T12:46:31.051-04:00

So there have been some comments about GUI-based options, and I'd plug one for radiant: http://vnijs.github.io/radiant/. I would also say tableone takes a shot at this as well: https://cran.r-project.org/web/packages/tableone/vignettes/introduction.html

Greetings, Thank you for your suggestions. Also, ...

2018-03-12T02:25:13.150-04:00

Greetings,

Thank you for your suggestions. Also, I noted that I used the "with" command unnecessarily in the text. The following syntax works without the "with" option.

mean.cars <- sapply(mtcars[c("mpg", "disp", "hp")], mean)
sd.cars <- sapply(mtcars[c("mpg", "disp", "hp")], sd)
n.cars <- sapply(mtcars[c("mpg", "disp", "hp")], length)

cbind(n.cars, mean.cars, sd.cars)
round(cbind(n.cars, mean.cars, sd.cars),2)

round(t(sapply(mtcars[c("mpg", "disp", "hp")],
function(x) c(n=length(x), avg=mean(x), stdev=sd(x)))), 2)

Sincerely, Murtaza

Consider the following approach and whether you ag...

2018-03-12T01:18:07.781-04:00

Consider the following approach and whether you agree that my description of teaching it is within the ability of undergraduates to grasp.

library(dplyr) # part of the tidyverse package/ecosystem to streamline processing
data(mtcars) # even built-in data still has to be loaded explicitly
fs <- c("mean", "sd") # create a vector of functions to be applied

# select from mtcars the three columns of interest, pipe them to be summarized (collapsed into
# a single row) consisting of the mean and standard deviation of each column then
# pipe to be rounded to three places and finally add a new column for the number of observations
# ASSUMED to be equal to the number of rows in the data frame; in the real world you have to
# deal with missing data. With only 32 rows, we can confirm the lack of NAs by visual inspection

select(mtcars, mpg, disp, hp) %>% summarize_all(funs_(fs)) %>% mutate_all(round, 3) %>% mutate(n = nrow(mtcars))

I appreciate that in the context of an undergraduate course in business analysis, you shouldn't have to devote a substantial amount of time teaching R. (However, a prospective statistics major who can't master R should consider an alternative career.)

What I'd suggest is to orient students to R by describing it a tool to solve alegbraic equations. They had to have grasped f(x) = y to make it to your classroom in the first place.

In this example,

1. They are going to be using functions provided by a contributed R package.
2. They need to know how to install the package if they get a not-found error
3. They need to know ls() to begin developing a notion of namespace, so if they are trying to do something with mtcars and it's a not found object, they can see for themselves that it's not there and they need to do something, which is to use the data function. Eventually, they will need to learn how to read in their own data.
4. They need to know that <- and = are assignment operators that give a name some property
5. c means combine into a vector
6. A vector is a list of objects, analogous to either a row or a column of a spreadsheet
7. The quotation marks within c are to use the *names* of the functions, not the functions themselves. Otherwise, there's no way of knowing the mean of what?
8. select is a function from dplyr; it's arguments are the name of the object, the mtcars data frame, the columns or variables to be pulled out.
9. %>% is a pipe that passes on the results from the left to the right
10. summarize_all takes the result, a trimmed down data frame, and applies a function
11. The function it applies is funs_, another dyplr feature that applies the functions in the vector fs to the trimmed down data frame.
12. Another %>% sends the result downstream to be rounded
13. Last is to add a column for n

With that as an example, it's easy to drill.

Redefine fs to obtain the minimum, maximum and median of mtcars, for mpg, disp and hp

fs <- c("min", "max", "median")

Do you need to make any changes to

select(mtcars, mpg, disp, hp) %>% summarize_all(funs_(fs)) %>% mutate_all(round, 3) %>% mutate(n = nrow(mtcars))

to get the new result?

What would you do to get quantiles and IQR, the interquantile range?

For statistics majors, get them out of cut and paste and into RMarkdown as soon as possible.

library(tidyverse) mtcars %>% select(mpg, di...

2018-03-12T01:01:11.151-04:00

library(tidyverse)
mtcars %>%
select(mpg, disp, hp) %>%
sapply(function(x) c(n = length(x), mu = mean(x), sigma = sd(x))) %>%
t %>%
round(2)

n mu sigma
mpg 32 20.09 6.03
disp 32 230.72 123.94
hp 32 146.69 68.56

#################### now without the tidyverse

mat <- sapply(subset(mtcars, select = c(mpg, disp, hp)),
function(x) c(n = length(x), mu = mean(x), sigma = sd(x)))
mat <- round(t(mat), 2)
mat
n mu sigma
mpg 32 20.09 6.03
disp 32 230.72 123.94
hp 32 146.69 68.56

############ where's the problem?

You might want to take a look at Jmovi (https://ww...

2018-03-11T16:54:58.443-04:00

You might want to take a look at Jmovi (https://www.jamovi.org/).

It's point-and-click but reproducible, and based on R.

We're starting to use it for researchers who want to do some of their own analysis. I'm thinking it would work for undergrad/MBA students where your goal isn't to teach them R, but for them to understand the analysis/stats concepts.

I agree, for a quite large part. I think, however,...

2018-03-11T04:10:47.510-04:00

I agree, for a quite large part. I think, however, tidyverse makes the simple things at least a little simpler. The task performed above then becomes
mtcars %>%
select(mpg, disp, hp) %>%
gather("Variable", "Value", mpg, disp, hp) %>%
group_by(Variable) %>%
summarise_all(funs(length, mean, sd, min, max))
It still requires the student to learn quite a few things, so perhaps even more useful is to just tell the students that with R, a little bit of googling is part of the work flow. Also, tell the students that https://www.statmethods.net/index.html is a very good resource (which in this case quickly points you to Hmisc::describe, pastecs::stat.desc as well as psych::describe that you mention yourself).