Relationships Between Numerical Features

Stacia Varga continues her exploratory data analysis series using hockey data:

Let’s start with something easy and understandable to analyze. If I put age on the horizontal axis and weight on the vertical axis. It’s a common practice to put an explanatory variable on the horizontal axis and a response variable on the vertical axis. In other words, I’m looking to see how an increase in age (explanation) affects – or not – weight (response) for all the hockey players in the current season, regardless of team.

If I put age on the horizontal axis – does this explain weight? Sort of – the combinations of age and weight have some groupings. It almost appears that there is a greater number of younger, heavier players than older, heavier players, but it’s hard to tell here how the age/weight combinations are distributed because I can’t see all the individual points.

Read the whole thing, while keeping in mind that correlation does not imply causation.

Related Posts

Naive Bays in R

Zulaikha Lateef takes us through the Naive Bayes algorithm and implementations in R: Naive Bayes is a Supervised Machine Learning algorithm based on the Bayes Theorem that is used to solve classification problems by following a probabilistic approach. It is based on the idea that the predictor variables in a Machine Learning model are independent of […]

Read More

Forensic Accounting: Cohort Analysis

I continue my series on forensic accounting techniques with cohort analysis: In the last post, we focused on high-level aggregates to gain a basic understanding of our data. We saw some suspicious results but couldn’t say much more than “This looks weird” due to our level of aggregation. In this post, I want to dig […]

Read More


May 2018
« Apr Jun »