Julia Silge gives us an idea of how to tune random forest hyperparameters in R:
Our modeling goal here is to predict the legal status of the trees in San Francisco in the #TidyTuesday dataset. This isn’t this week’s dataset, but it’s one I have been wanting to return to. Because it seems almost wrong not to, we’ll be using a random forest model! 🌳
Let’s build a model to predict which trees are maintained by the San Francisco Department of Public Works and which are not. We can use
parse_number()
to get a rough estimate of the size of the plot from theplot_size
column. Instead of trying any imputation, we will just keep observations with noNA
values.
Click through to some data exploration, the initial model, and a process for using Grid Search with the caret
package.