R Data Frames And stringsAsFactors

Kevin Feasel

2018-03-20

R

John Mount recommends setting stringsAsFactors = FALSE for data frames in R:

R often uses a concept of factors to re-encode strings. This can be too early and too aggressive. Sometimes a string is just a string.

Tibbles have this set by default.  For an explanation as to why it defaults to TRUE for data frames, Roger Peng has the story.

Related Posts

From Excel to R: Three Examples

Abdul Majed Raja has a few examples of things which are easy to do in Excel and how you can do them in R: Create a difference variable between the current value and the next valueThis is also known as lead and lag – especially in a time series dataset this varaible becomes very important in feature engineering. In […]

Read More

Calculating AUC in R

Andrew Treadway shows how you can calculate Area Under the Curve in R: AUC is an important metric in machine learning for classification. It is often used as a measure of a model’s performance. In effect, AUC is a measure between 0 and 1 of a model’s performance that rank-orders predictions from a model. For […]

Read More

Categories

March 2018
MTWTFSS
« Feb Apr »
 1234
567891011
12131415161718
19202122232425
262728293031