Machine Learning Data Preparation Tips

Jen Underwood has some good tips when preparing data for a machine learning operation:

Data preparation for machine learning requires business domain expertise, bias awareness and an experimental thought process. Before preparing your data, you’ll first define a business problem solve. During that exercise, you’ll select an outcome metric and brainstorm potential input variables that influence it from many varied perspectives. From there you will begin identifying, collecting, cleaning, shaping and sampling data to run through automated machine learning model processes.

Note that it is also not unusual for relevant machine learning input data to occur outside of existing transactional processes. If that is the case, you can still start creating a first-generation machine learning model with existing data and continue to build new model versions over time as supplementary data is acquired.

Click through for the ten tips.

Related Posts

XGBoost With Python

Fisseha Berhane looked at Extreme Gradient Boosting with R and now covers it in Python: In both R and Python, the default base learners are trees (gbtree) but we can also specify gblinear for linear models and dart for both classification and regression problems. In this post, I will optimize only three of the parameters […]

Read More

Calling Azure Cognitive Services From SSIS

Rolf Tesmer shows off how easy it is to call Azure Cognitive Services from SQL Server Integration Services: My SQL SSIS package leverages the Translator Text API service.  For those who want to learn the secret sauce then I suggest to check here – essentially this API is pretty simple; It accepts source text, source language and target language.  (The API can translate to/from over […]

Read More


November 2017
« Oct Dec »