When the target only takes two values we have a binary classification problem at hand. Example of binary classification are very common. For instance fraud detection where examples are credit card transactions, features are time, location, amount, merchant id, etc., and target is fraud or not fraud. Spam detection is also a binary classification where examples are emails, features are the email content as a string of words, and target is spam or not spam. Without loss of generality we can assume that the target values are 0 and 1, for instance 0 means no fraud or no spam, whiloe 1 means fraud or spam.
For binary classification, predictions are also binary. Therefore, a prediction is either equal to the target, or is off the mark. A simple way to evaluate model performance is accuracy: how many predictions are right? For instance, if our test set has 100 examples in it, how many times is the prediction correct? Accuracy seems a logical way to evaluate performance: a higher accuracy obviously means a better model. At least this is what people think when they are exposed to the first time to binary classification problems. Issue is that accuracy can be extremely misleading.
Read Jean-Francois’ explanation and scroll down for the Python sample.