Dan Lantos, et al, walk us through one technique for model explainability:
Interpretability has to do with how accurately a machine learning model can associate a cause (input) to an effect (output).
Explainability on the other hand is the extent to which the internal mechanics of a machine or deep learning system can be explained in human terms. Or to put it simply, explainability is the ability to explain what is happening.
Let’s consider a simple example illustrated below where the goal of the machine learning model is to classify an animal into its respective groups. We use an image of a butterfly as input into the machine learning model. The model would classify the butterfly as either an insect, mammal, fish, reptile or bird. Typically, most complex machine learning models would provide a classification without explaining how the features contributed to the result. However, using tools that help with explainability, we can overcome this limitation. We can then understand what particular features of the butterfly contributed to it being classified as an insect. Since the butterfly has six legs, it is thus classified as an insect.
Being able to provide a rationale behind a model’s prediction would give the users (and the developers) confidence about the validity of the model’s decision.
Read on to see how you can use a library called SHAP in Python to help with this explainability.