Random Forests In scikit-learn

Mark Needham shows how easy it is to create a random forest model in Python using scikit-learn:

As I mentioned in a blog post a couple of weeks ago, I’ve been playing around with the Kaggle House Prices competition and the most recent thing I tried was training a random forest regressor.

Unfortunately, although it gave me better results locally it got a worse score on the unseen data, which I figured meant I’d overfitted the model.

I wasn’t really sure how to work out if that theory was true or not, but by chance, I was reading Chris Albon’s blog and found a post where he explains how to inspect the importance of every feature in a random forest. Just what I needed!

There’s a nagging voice in my head saying “Principal Component Analysis” as I read this post.

Related Posts

Comparing TensorFlow Versus PyTorch

Anirudh Rao compares PyTorch to TensorFlow: For small-scale server-side deployments both frameworks are easy to wrap in e.g. a Flask web server. For mobile and embedded deployments, TensorFlow works really well. This is more than what can be said of most other deep learning frameworks including PyTorch. Deploying to Android or iOS does require a non-trivial amount of work in TensorFlow. You don’t have to rewrite the entire inference portion of your model in Java or C++. […]

Read More

Data Science And Data Engineering In HDP 3.0

Saumitra Buragohain, et al, show off some of the things added to the Hortonworks Data Platform for data scientists and data engineers: We leverage the power of HDP 3.0 from efficient storage (erasure coding), GPU pooling to containerized TensorFlow and Zeppelin to enable this use case. We will the save the details for a different […]

Read More


June 2017
« May Jul »