The Bayesian Trap

David Smith links to a video describing an application of Bayes’s Theorem and gives the example of medical tests:

If you get a blood test to diagnose a rare disease, and the test (which is very accurate) comes back positive, what’s the chance you have the disease? Well if “rare” means only 1 in a thousand people have the disease, and “very accurate” means the test returns the correct result 99% of the time, the answer is … just 9%. There’s less than a 1 in 10 chance you actually have the disease (which is why doctor will likely have you tested a second time).

Now that result might seem surprising, but it makes sense if you apply Bayes Theorem. (A simple way to think of it is that in a population of 1000 people, 10 people will have a positive test result, plus the one who actually has the disease. One in eleven of the positive results, or 9%, actually detect a true disease.)

This goes to sensitivity/recall (in the medical field, they call it sensitivity; in the documents world and in the Microsoft ML space, they call it recall):  True positives / (True positives + False negatives).  Supposing a million people, 1000 will have the disease.  Of those 1000, we expect the test to find 990 (99%).  Of the 999,000 people who don’t have the disease, we expect the test to produce 9990 false negatives (1%).  990 / (990 + 9990) = 9%.

Related Posts

Multi-Class Text Classification In Python

Susan Li has a series on multi-class text classification in Python.  First up is analysis with PySpark: Our task is to classify San Francisco Crime Description into 33 pre-defined categories. The data can be downloaded from Kaggle. Given a new crime description comes in, we want to assign it to one of 33 categories. The classifier […]

Read More

The Microsoft Team Data Science Process Lifecycle Versus CRISP-DM

Melody Zacharias compares Microsoft’s Team Data Science Process lifecycle with the CRISP-DM process: As I pointed out in my previous blog, the TDSP lifecycle is made up of five iterative stages: Business Understanding Data Acquisition and Understanding Modeling Deployment Customer Acceptance This is not very different from the six major phases used by the Cross […]

Read More