On ML competition platforms like Kaggle, complex and unintuitively behaving models dominate. In this respect, reality is completely different. There, the majority of models do not serve as pure prediction machines but rather as fruitful source of information. Furthermore, even if used as prediction machine, the users of the models might expect a certain degree of consistency when “playing” with input values.
A classic example are statistical house appraisal models. An additional bathroom or an additional square foot of ground area is expected to raise the appraisal, everything else being fixed (ceteris paribus). The user might lose trust in the model if the opposite happens.
One way to enforce such consistency is to monitor the signs of coefficients of a linear regression model. Another useful strategy is to impose monotonicity constraints on selected model effects.
Certain types of regression algorithm make this easy, but random forest? Not so much. That’s where Michael steps in.