Sentiment analysis is a method of natural language processing that involves classifying words in a document based on whether a word is positive or negative, or whether it is related to a set of basic human emotions; the exact results differ based on the sentiment analysis method selected. The tidytext R package has 4 different sentiment analysis methods:
- “AFINN” for Finn Årup Nielsen – which classifies words from -5 to +5 in terms of negative or positive valence
- “bing” for Bing Liu and colleagues – which classifies words as either positive or negative
- “loughran” for Loughran-McDonald – mostly for financial and nonfiction works, which classifies as positive or negative, as well as topics of uncertainty, litigious, modal, and constraining
- “nrc” for the NRC lexicon – which classifies words into eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) as well as positive or negative sentiment
Sentiment analysis works on unigrams – single words – but you can aggregate across multiple words to look at sentiment across a text.
To demonstrate sentiment analysis, I’ll use one of my favorite songs: “Hotel California” by the Eagles.
Read the whole thing, though you can’t check out afterward.