# Naive Bayes Against Large Data Sets

2018-09-12

The more data that is used to train the classifier, the more accurate it will become over time. So if we continue to train it with actual results in 2017, then what it predicts in 2018 will be more accurate. Also, when Bayes gives a prediction, it will attach a probability. So it may answer the above question as follows: “Based on past data, I predict with 60% confidence that it will rain today.”

So the classifier is either in training mode or predicting mode. It is in training mode when we are teaching it. In this case, we are feeding it the outcome (the category). It is in predicting mode when we are giving it the features, but asking it what the most likely outcome will be.

My contribution is a joke that I heard last night:  a Bayesian statistician hears hooves clomping the ground.  He turns around and sees a tiger.  Therefore, he decides that it must be a zebra.  First time I’d heard that joke, and as a Bayesian zebra-spotter, I enjoyed it.

## Interpreting The Area Under The Receiver Operating Characteristic Curve

2018-09-19

Roos Colman explains what a Receiver Operating Characteristic (ROC) curve is and how we interpret the Area Under the Curve (AUC): The AUC can be defined as “The probability that a randomly selected case will have a higher test result than a randomly selected control”. Let’s use this definition to calculate and visualize the estimated […]

## Disambiguating The Confusion Matrix

2018-09-11

John Cook walks through a set of valuable terms derived from the core components of the confusion matrix: How many terms are possible? There are four basic ingredients: TP, FP, TN, and FN. So if each term may or may not be included in a sum in the numerator and denominator, that’s 16 possible numerators […]

This site uses Akismet to reduce spam. Learn how your comment data is processed.

September 2018
MTWTFSS
« Aug
12
3456789
10111213141516
17181920212223
24252627282930