Naive Bayes Against Large Data Sets

Catherine Bernadorne walks us through using Naive Bayes for sentiment analysis:

The more data that is used to train the classifier, the more accurate it will become over time. So if we continue to train it with actual results in 2017, then what it predicts in 2018 will be more accurate. Also, when Bayes gives a prediction, it will attach a probability. So it may answer the above question as follows: “Based on past data, I predict with 60% confidence that it will rain today.”

So the classifier is either in training mode or predicting mode. It is in training mode when we are teaching it. In this case, we are feeding it the outcome (the category). It is in predicting mode when we are giving it the features, but asking it what the most likely outcome will be.

My contribution is a joke that I heard last night:  a Bayesian statistician hears hooves clomping the ground.  He turns around and sees a tiger.  Therefore, he decides that it must be a zebra.  First time I’d heard that joke, and as a Bayesian zebra-spotter, I enjoyed it.

Related Posts

Road Construction Incentive Contracts And R

Sebastian Kranz promotes an interesting RTutor project: Patrick Bajari and Gregory Lewis have collected a detailed sample of 466 road construction projects in Minnesota to study this question in their very interesting article Moral Hazard, Incentive Contracts and Risk: Evidence from Procurement in the Review of Economic Studies, 2014.They estimate a structural econometric model and find that […]

Read More

Analyzing Customer Churn With Keras And H2O

Shirin Glander has released code pertaining to a forthcoming book chapter: This is code that accompanies a book chapter on customer churn that I have written for the German dpunkt Verlag. The book is in German and will probably appear in February: code you find below can be used to recreate all figures and analyses from this […]

Read More


September 2018
« Aug Oct »