Training Data With Azure ML

Koos van Strien discusses training data sets and cross-validating results:

When choosing a train and testset, you’ll implicitly introduce a new bias: it could be that the model you just trained predicts well for this particular testset, when trained for this particular trainset. To reduce this bias, you could “cross-validate” your results.

Cross-validation (often abbreviated as just “cv”) splits the dataset into n folds. Each fold is used once as a testset, using all other folds together as a training set. So in our pizza example with 100 records, with 5 folds we will have 5 test runs:

This isn’t Azure ML-specific, and is good reading.

Related Posts

Master Data In Azure

Matt How explains why Master Data Services isn’t a great cloud-based master data management solution and offers up an alternative: Excel is easy to use, but not user friendly Excel is on nearly every desktop in any Windows based organisation and with the Master Data Services Add-in, it puts the data well within the reach […]

Read More

Measuring Semantic Relatedness

Sandipan Dey re-works a university assignment on semantic relatedness in Python: Let’s define the semantic relatedness of two WordNet nouns x and y as follows: A = set of synsets in which x appears B = set of synsets in which y appears distance(x, y) = length of shortest ancestral path of subsets A and B sca(x, y) = a shortest common ancestor of subsets A and B This is the notion of […]

Read More


August 2016
« Jul Sep »