Quickly Computing Area Under The Curve

Jean-Francois Puget has a fast method for computing Area Under the Curve in Python:

When the target only takes two values we have a binary classification problem at hand.  Example of binary classification are very common. For instance fraud detection where examples are credit card transactions, features are time, location, amount, merchant id, etc., and target is fraud or not fraud.  Spam detection is also a binary classification where examples are emails, features are the email content as a string of words, and target is spam or not spam.  Without loss of generality we can assume that the target values are 0 and 1, for instance 0 means no fraud or no spam, whiloe 1 means fraud or spam.

For binary classification, predictions are also binary.  Therefore, a prediction is either equal to the target, or is off the mark.  A simple way to evaluate model performance is accuracy: how many predictions are right? For instance, if our test set has 100 examples in it, how many times is the prediction correct?  Accuracy seems a logical way to evaluate performance: a higher accuracy obviously means a better model.  At least this is what people think when they are exposed to the first time to binary classification problems.  Issue is that accuracy can be extremely misleading.

Read Jean-Francois’ explanation and scroll down for the Python sample.

Related Posts

Housing Prices In Ames, Iowa: A Kaggle Competition

Kathryn Bryant and M. Aaron Owen share their Kaggle experiences.  First, Kathryn, et al: The lifecycle of our project was a typical one. We started with data cleaning and basic exploratory data analysis, then proceeded to feature engineering, individual model training, and ensembling/stacking. Of course, the process in practice was not quite so linear and […]

Read More

Picking A Python IDE

Kevin Jacobs reviews a few Python IDEs from the perspective of a data scientist: Ladies and gentlemens, this is one of the most perfect IDEs for editing your Python code! At least in my opinion. Jupyter notebook is a web based code editor and can quickly generate visualizations. You can mix up code and text […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories

November 2017
MTWTFSS
« Oct  
 12345
6789101112
13141516171819
20212223242526
27282930