Quickly Computing Area Under The Curve

Jean-Francois Puget has a fast method for computing Area Under the Curve in Python:

When the target only takes two values we have a binary classification problem at hand.  Example of binary classification are very common. For instance fraud detection where examples are credit card transactions, features are time, location, amount, merchant id, etc., and target is fraud or not fraud.  Spam detection is also a binary classification where examples are emails, features are the email content as a string of words, and target is spam or not spam.  Without loss of generality we can assume that the target values are 0 and 1, for instance 0 means no fraud or no spam, whiloe 1 means fraud or spam.

For binary classification, predictions are also binary.  Therefore, a prediction is either equal to the target, or is off the mark.  A simple way to evaluate model performance is accuracy: how many predictions are right? For instance, if our test set has 100 examples in it, how many times is the prediction correct?  Accuracy seems a logical way to evaluate performance: a higher accuracy obviously means a better model.  At least this is what people think when they are exposed to the first time to binary classification problems.  Issue is that accuracy can be extremely misleading.

Read Jean-Francois’ explanation and scroll down for the Python sample.

Related Posts

Naive Bayes Against Large Data Sets

Catherine Bernadorne walks us through using Naive Bayes for sentiment analysis: The more data that is used to train the classifier, the more accurate it will become over time. So if we continue to train it with actual results in 2017, then what it predicts in 2018 will be more accurate. Also, when Bayes gives […]

Read More

Disambiguating The Confusion Matrix

John Cook walks through a set of valuable terms derived from the core components of the confusion matrix: How many terms are possible? There are four basic ingredients: TP, FP, TN, and FN. So if each term may or may not be included in a sum in the numerator and denominator, that’s 16 possible numerators […]

Read More


November 2017
« Oct Dec »