Machine Learning Skepticism

Julia Evans gives reasons to tamp down expectations with machine learning:

When explaining what machine learning is, I’m giving the example of predicting the country someone lives in from their first name. So John might be American and Johannes might be German.

In this case, it’s really easy to imagine what data you might want to do a good job at this — just get the first names and current countries of every person in the world! Then count up which countries Julias live in (Canada? The US? Germany?), pick the most likely one, and you’re done!

This is a super simple modelling process, but I think it’s a good illustration — if you don’t include any data from China when training your computer to recognize names, it’s not going to get any Chinese names right!

Machine learning projects are like any other development projects, with more complex algorithms.  There’s no magic and there’s a lot of perspiration (hopefully figuratively rather than literally) involved in getting a program which behaves correctly.

Related Posts

Sentiment Analysis with Python

Bruno Stecanella shows us how to use MonkeyLearn to perform sentiment analysis in Python: Sentiment analysis is a set of Natural Language Processing (NLP) techniques that takes a text (in more academic circles, a document) written in natural language and extracts the opinions present in the text. In a more practical sense, our objective here is to take a text […]

Read More

Scalable Anomaly Detection with Kafka and Cassandra

Paul Brebner wraps up a series on anomaly detection at scale: The complete machine for the biggest result (48 Cassandra nodes) has 574 cores in total.  This is a lot of cores! Managing the provisioning and monitoring of this sized system by hand would be an enormous effort. With the combination of the Instaclustr managed […]

Read More


May 2016
« Apr Jun »