The Data Exploration Process

Stacia Varga takes a step back from analyzing NHL data to explore it a little more:

As I mentioned in my last post, I am currently in an exploratory phase with my data analytics project. Although I would love to dive in and do some cool predictive analytics or machine learning projects, I really need to continue learning as much about my data as possible before diving into more advanced techniques.

My data exploration process has the following four steps:

  1. Assess the data that I have at a high level

  2. Determine how this data is relevant to the analytics project I want to undertake

  3. Get a general overview of the data characteristics by calculating simple statistics

  4. Understand the “middles” and the “ends” of your numeric data points

There’s some good stuff in here.  I particularly appreciate Stacia’s consideration of data exploration as an iterative process.

Related Posts

Tidy Anomaly Detection With Anomalize

Abdul Majed Raja walks us through an example using the anomalize package: One of the important things to do with Time Series data before starting with Time Series forecasting or Modelling is Time Series Decomposition where the Time series data is decomposed into Seasonal, Trend and remainder components. anomalize has got a function time_decompose() to perform the same. […]

Read More

Uploading Data Sets To Azure ML From R

Leila Etaati continues her series on the Azure ML R package by showing how to upload a data set: There is a function in AzureML package name “workspace” that creates a reference to an AzureML Studio workspace by getting the authentication token and workspace id as below: 1 ws <– workspace( id , auth  ) to […]

Read More


March 2018
« Feb Apr »