XGBoost In R

Fisseha Berhane explains how to implement Extreme Gradient Boosting in R:

What makes it so popular are its speed and performance. It gives among the best performances in many machine learning applications. It is optimized gradient-boosting machine learning library. The core algorithm is parallelizable and hence it can use all the processing power of your machine and the machines in your cluster. In R, according to the package documentation, since the package can automatically do parallel computation on a single machine, it could be more than 10 times faster than existing gradient boosting packages.

xgboost shines when we have lots of training data where the features are numeric or a mixture of numeric and categorical fields. It is also important to note that xgboost is not the best algorithm out there when all the features are categorical or when the number of rows is less than the number of fields (columns).

xgboost is a nice complement to neural networks, as they tend to be great at different things.

Related Posts

ggplot2 Geoms And Aesthetics

Tyler Rinker digs into ggplot2’s geoms and aesthetics: I thought it my be fun to use the geoms aesthetics to see if we could cluster aesthetically similar geoms closer together. The heatmap below uses cosine similarity and heirarchical clustering to reorder the matrix that will allow for like geoms to be found closer to one […]

Read More

Multi-Class Text Classification In Python

Susan Li has a series on multi-class text classification in Python.  First up is analysis with PySpark: Our task is to classify San Francisco Crime Description into 33 pre-defined categories. The data can be downloaded from Kaggle. Given a new crime description comes in, we want to assign it to one of 33 categories. The classifier […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *


March 2018
« Feb