Training Data With Azure ML

Koos van Strien discusses training data sets and cross-validating results:

When choosing a train and testset, you’ll implicitly introduce a new bias: it could be that the model you just trained predicts well for this particular testset, when trained for this particular trainset. To reduce this bias, you could “cross-validate” your results.

Cross-validation (often abbreviated as just “cv”) splits the dataset into n folds. Each fold is used once as a testset, using all other folds together as a training set. So in our pizza example with 100 records, with 5 folds we will have 5 test runs:

This isn’t Azure ML-specific, and is good reading.

Related Posts

Markov Chains In Python

Sandipan Dey shows off various uses of Markov chains as well as how to create one in Python: Perspective. In the 1948 landmark paper A Mathematical Theory of Communication, Claude Shannon founded the field of information theory and revolutionized the telecommunications industry, laying the groundwork for today’s Information Age. In this paper, Shannon proposed using a Markov chain to […]

Read More

What’s Happing In Azure Data Factory Right Now?

Melissa Coates has a couple Powershell scripts to figure out which pipelines are currently running in Azure Data Factory v1: This is a quick post to share a few scripts to find what is currently executing in Azure Data Factory. These PowerShell scripts are applicable to ADF version 1 (not version 2 which uses different cmdlets). Prerequisite: […]

Read More


August 2016
« Jul Sep »