If you get a blood test to diagnose a rare disease, and the test (which is very accurate) comes back positive, what’s the chance you have the disease? Well if “rare” means only 1 in a thousand people have the disease, and “very accurate” means the test returns the correct result 99% of the time, the answer is … just 9%. There’s less than a 1 in 10 chance you actually have the disease (which is why doctor will likely have you tested a second time).

Now that result might seem surprising, but it makes sense if you apply Bayes Theorem. (A simple way to think of it is that in a population of 1000 people, 10 people will have a positive test result, plus the one who actually has the disease. One in eleven of the positive results, or 9%, actually detect a true disease.)

This goes to sensitivity/recall (in the medical field, they call it sensitivity; in the documents world and in the Microsoft ML space, they call it recall): True positives / (True positives + False negatives). Supposing a million people, 1000 will have the disease. Of those 1000, we expect the test to find 990 (99%). Of the 999,000 people who don’t have the disease, we expect the test to produce 9990 false negatives (1%). 990 / (990 + 9990) = 9%.

Kevin Feasel

2017-05-08

Data Science