Summary Improvements In R

Kevin Feasel



John Mount points out a nice quasi-bugfix in R 3.4.0:

In older versions of R (say R 3.3.1) the above code gave the following undesirable result:

# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 15560 15560 15560 15560 15560 15560 

This was always very confusing and hard to explain to beginners. To justify this you had to explain that “R, by default, calculates the summary rounded to 4 significant digits, and is simultaneously configured to give absolutely no indication has to how many significant digits are in fact being displayed.” To add insult to injury summary()picked a different number of sigfigs than the default numeric presentation. One could type “median(15555)” and get the expected presentation “15555“.

I like this change.

Related Posts

R Data Frames And stringsAsFactors

John Mount recommends setting stringsAsFactors = FALSE for data frames in R: R often uses a concept of factors to re-encode strings. This can be too early and too aggressive. Sometimes a string is just a string. Tibbles have this set by default.  For an explanation as to why it defaults to TRUE for data frames, Roger […]

Read More


John Mount explains the vtreat package that he and Nina Zumel have put together: When attempting predictive modeling with real-world data you quicklyrun into difficulties beyond what is typically emphasized in machine learning coursework: Missing, invalid, or out of range values. Categorical variables with large sets of possible levels. Novel categorical levels discovered during test, cross-validation, or […]

Read More

1 Comment

  • DEVOPS on 2017-06-06

    nice information

Comments are closed


June 2017
« May Jul »