Tidying Video Game Data

Arvid Kingl has a fun article analyzing data from an open-source video game and applying tidy data principles to it:

You will learn what key principles a tidy data set adheres to, why it is useful to follow them consequently, and how to clean the data you are given. Tidying is also a great way to get to know a new data set.

Finally, in this tutorial you will learn how to write a function that makes your analysis look much cleaner and allows you to execute repetitive elements in your analysis in a very reproducible way. The function will allow you to load the latest version of the data dynamically into a flexible data scheme, which means that large parts of the code will not have to change when new data is added.

Check it out. Bonus point: tidy data is Boyce-Codd Normal Form which is (potentially) subsequently widened back out to include dimensional information.

Related Posts

From Excel to R: Three Examples

Abdul Majed Raja has a few examples of things which are easy to do in Excel and how you can do them in R: Create a difference variable between the current value and the next valueThis is also known as lead and lag – especially in a time series dataset this varaible becomes very important in feature engineering. In […]

Read More

Calculating AUC in R

Andrew Treadway shows how you can calculate Area Under the Curve in R: AUC is an important metric in machine learning for classification. It is often used as a measure of a model’s performance. In effect, AUC is a measure between 0 and 1 of a model’s performance that rank-orders predictions from a model. For […]

Read More

Categories

April 2019
MTWTFSS
« Mar May »
1234567
891011121314
15161718192021
22232425262728
2930