Let’s look at a concrete example with the Click-Through Rate Prediction dataset of ad impressions and clicks from the data science website Kaggle. The goal of this workflow is to create a machine learning model that, given a new ad impression, predicts whether or not there will be a click.
To build our advanced analytics workflow, let’s focus on the three main steps:
-
ETL
-
Data Exploration, for example, using SQL
-
Advanced Analytics / Machine Learning
The Databricks blog has a couple other examples, but this was the most interesting one for me.