The Bayesian Trap

David Smith links to a video describing an application of Bayes’s Theorem and gives the example of medical tests:

If you get a blood test to diagnose a rare disease, and the test (which is very accurate) comes back positive, what’s the chance you have the disease? Well if “rare” means only 1 in a thousand people have the disease, and “very accurate” means the test returns the correct result 99% of the time, the answer is … just 9%. There’s less than a 1 in 10 chance you actually have the disease (which is why doctor will likely have you tested a second time).

Now that result might seem surprising, but it makes sense if you apply Bayes Theorem. (A simple way to think of it is that in a population of 1000 people, 10 people will have a positive test result, plus the one who actually has the disease. One in eleven of the positive results, or 9%, actually detect a true disease.)

This goes to sensitivity/recall (in the medical field, they call it sensitivity; in the documents world and in the Microsoft ML space, they call it recall):  True positives / (True positives + False negatives).  Supposing a million people, 1000 will have the disease.  Of those 1000, we expect the test to find 990 (99%).  Of the 999,000 people who don’t have the disease, we expect the test to produce 9990 false negatives (1%).  990 / (990 + 9990) = 9%.

Related Posts

Testing Spatial Equilibrium Concepts With tidycensus

Ignacio Sarmiento Barbieri walks us through the concept of spatial equilibrium and tests using data from the tidycensus package: Let’s take the model to the data and reproduce figures 2.1. and 2.2 of “Cities, Agglomeration, and Spatial Equilibrium”. The focus are two cities, Chicago and Boston. These cities are chosen because both differ in how easy […]

Read More

Interacting With SQL Server From Pandas

Tomaz Kastrun shows how to use pyodbc to interact with a SQL Server database from Pandas: In the SQL Server Management Studio (SSMS), the ease of using external procedure sp_execute_external_script has been (and still will be) discussed many times. But the reason for this short blog post is the fact that, changing Python environments using Conda package/module management within Microsoft […]

Read More