Mark Needham shows how easy it is to create a random forest model in Python using scikit-learn:
As I mentioned in a blog post a couple of weeks ago, I’ve been playing around with the Kaggle House Prices competition and the most recent thing I tried was training a random forest regressor.
Unfortunately, although it gave me better results locally it got a worse score on the unseen data, which I figured meant I’d overfitted the model.
I wasn’t really sure how to work out if that theory was true or not, but by chance, I was reading Chris Albon’s blog and found a post where he explains how to inspect the importance of every feature in a random forest. Just what I needed!
There’s a nagging voice in my head saying “Principal Component Analysis” as I read this post.