Random Forests In scikit-learn

Mark Needham shows how easy it is to create a random forest model in Python using scikit-learn:

As I mentioned in a blog post a couple of weeks ago, I’ve been playing around with the Kaggle House Prices competition and the most recent thing I tried was training a random forest regressor.

Unfortunately, although it gave me better results locally it got a worse score on the unseen data, which I figured meant I’d overfitted the model.

I wasn’t really sure how to work out if that theory was true or not, but by chance, I was reading Chris Albon’s blog and found a post where he explains how to inspect the importance of every feature in a random forest. Just what I needed!

There’s a nagging voice in my head saying “Principal Component Analysis” as I read this post.

Related Posts

Random Forests In R

Anish Sing Walia explains the basics of random forests and provides sample code in R: Random Forests are similar to a famous Ensemble technique called Bagging but have a different tweak in it. In Random Forests the idea is to decorrelate the several trees which are generated on the different bootstrapped samples from training Data.And […]

Read More

Neural Networks From Scratch

Ilia Karmanov explains neural nets and shows how to build one in R: Hence, my motivation for this post is two-fold: Understanding (by writing from scratch) the leaky abstractions behind neural-networks dramatically shifted my focus to elements whose importance I initially overlooked. If my model is not learning I have a better idea of what […]

Read More


June 2017
« May Jul »