Joseph Bradley shows how you can perform hyperparameter tuning of an MLlib model with MLflow:
Apache Spark MLlib users often tune hyperparameters using MLlib’s built-in tools
CrossValidator
andTrainValidationSplit
. These use grid search to try out a user-specified set of hyperparameter values; see the Spark docs on tuning for more info.Databricks Runtime 5.3 and 5.3 ML and above support automatic MLflow tracking for MLlib tuning in Python.
With this feature, PySpark
CrossValidator
andTrainValidationSplit
will automatically log to MLflow, organizing runs in a hierarchy and logging hyperparameters and the evaluation metric. For example, callingCrossValidator.fit()
will log one parent run. Under this run,CrossValidator
will log one child run for each hyperparameter setting, and each of those child runs will include the hyperparameter setting and the evaluation metric. Comparing these runs in the MLflow UI helps with visualizing the effect of tuning each hyperparameter.
Hyperparameter tuning is critical for some of the more complex algorithms like random forests, gradient boosting, and neural networks.
Comments closed