Naive Bayes In Python

Kislay Keshari explains the Naive Bayes algorithm and shows an implementation in Python:

Naive Bayes in the Industry

Now that you have an idea of what exactly Naive Bayes is and how it works, let’s see where it is used in the industry.

RSS Feeds

Our first industrial use case is News Categorization, or we can use the term ‘text classification’ to broaden the spectrum of this algorithm. News on the web is rapidly growing where each news site has its own different layout and categorization for grouping news. Companies use a web crawler to extract useful text from HTML pages of news articles to construct a Full Text RSS. The contents of each news article is tokenized (categorized). In order to achieve better classification results, we remove the less significant words, i.e. stop, from the document. We apply the naive Bayes classifier for classification of news content based on news code.

It’s a good overview of the topic and a particular implementation in Python.  Naive Bayes is a technique which you want in the bag:  there are a lot of techniques which tend to be better in specific domains, but Naive Bayes is easy to implement and usually provides acceptable performance.

Related Posts

Defining TF-IDF

Bruno Stecanella explains the concept behind TF-IDF: TF-IDF was invented for document search and information retrieval. It works by increasing proportionally to the number of times a word appears in a document, but is offset by the number of documents that contain the word. So, words that are common in every document, such as this, what, and if, rank […]

Read More

Sentiment Analysis with Python

Bruno Stecanella shows us how to use MonkeyLearn to perform sentiment analysis in Python: Sentiment analysis is a set of Natural Language Processing (NLP) techniques that takes a text (in more academic circles, a document) written in natural language and extracts the opinions present in the text. In a more practical sense, our objective here is to take a text […]

Read More


August 2018
« Jul Sep »