The Importance Of Model Interpretability

Ilknur Kaynar Kabul explains why it’s important that your data science models be interpretable:

Some machine learning models are simple and easy to understand. We know how changing the inputs will affect the predicted outcome and can make justification for each prediction. However, with the recent advances in machine learning and artificial intelligence, models have become very complex, including complex deep neural networks and ensembles of different models. We refer to these complex models as black box models.

Unfortunately, the complexity that gives extraordinary predictive abilities to black box models also makes them very difficult to understand and trust. The algorithms inside the black box models do not expose their secrets. They don’t, in general, provide a clear explanation of why they made a certain prediction. They just give us a probability, and they are opaque and hard to interpret. Sometimes there are thousands (even millions) of model parameters, there’s no one-to-one relationship between input features and parameters, and often combinations of multiple models using many parameters affect the prediction. Some of them are also data hungry. They need enormous amounts of data to achieve high accuracy. It’s hard to figure out what they learned from those data sets and which of those data points have more influence on the outcome than the others.

This post reminds me of a story I’d heard about a financial organization using neural networks to build accurate models, but then needing to decompose the models into complex decision trees to explain to auditors that they weren’t violating any laws in the process.

Related Posts

Conjoint Analysis In R

Abhijit Telang introduces the concept of conjoint analysis and shows how you can implement this in R: We will need to typically transform the problem of utility modeling from its intangible, abstract form to something that is measurable. That is, we wish to assign a numeric value to the perceived utility by the consumer, and […]

Read More

Bayesian Modeling Of Hardware Failure Rates

Sean Owen shows how you can use Bayesian statistical approaches with Spark Streaming, using the example of hard drive failure rates: This data doesn’t arrive all at once, in reality. It arrives in a stream, and so it’s natural to run these kind of queries continuously. This is simple with Apache Spark’s Structured Streaming, and proceeds […]

Read More

Categories

December 2017
MTWTFSS
« Nov Jan »
 123
45678910
11121314151617
18192021222324
25262728293031