Random Forests In scikit-learn

Mark Needham shows how easy it is to create a random forest model in Python using scikit-learn:

As I mentioned in a blog post a couple of weeks ago, I’ve been playing around with the Kaggle House Prices competition and the most recent thing I tried was training a random forest regressor.

Unfortunately, although it gave me better results locally it got a worse score on the unseen data, which I figured meant I’d overfitted the model.

I wasn’t really sure how to work out if that theory was true or not, but by chance, I was reading Chris Albon’s blog and found a post where he explains how to inspect the importance of every feature in a random forest. Just what I needed!

There’s a nagging voice in my head saying “Principal Component Analysis” as I read this post.

Related Posts

Unsupervised Decision Trees

William Vorhies describes what unsupervised decision trees are: In anomaly detection we are attempting to identify items or events that don’t match the expected pattern in the data set and are by definition rare.  The traditional ‘signature based’ approach widely used in intrusion detection systems creates training data that can be used in normal supervised […]

Read More

Stop Using word2vec

Chris Moody wants you to stop using word2vec: When I started playing with word2vec four years ago I needed (and luckily had) tons of supercomputer time. But because of advances in our understanding of word2vec, computing word vectors now takes fifteen minutes on a single run-of-the-mill computer with standard numerical libraries. Word vectors are awesome but you […]

Read More


June 2017
« May Jul »