Training A Text Classifier Against Books

Julia Silge builds a text classifier to differentiate Pride and Prejudice from War of the Worlds:

Now it’s time to train our classification model! Let’s use the glmnet package to fit a logistic regression model with LASSO regularization. It’s a great fit for text classification because the variable selection that LASSO regularization performs can tell you which words are important for your prediction problem. The glmnet package also supports parallel processing with very little hassle, so we can train on multiple cores with cross-validation on the training set using cv.glmnet().

Hot take: Jane Austen was the best English-language novelist of the 19th century. I’d say “all-time” but the world isn’t ready for a take that hot.

Related Posts

Python versus R (Again)

Alex Woodie looks at whether Python is dominating R in the data science space: There is some evidence that Python’s popularity is hurting R usage. According to the TIOBE Index, Python is currently the third most popular language in the world, behind perennial heavyweights Java and C. From August 2018 to August 2019, Python usage surged […]

Read More

Z-Tests vs T-Tests

Stephanie Glen has a picture which explains the difference between a Z-test and a T-test: The following picture shows the differences between the Z Test and T Test. Not sure which one to use? Find out more here: T-Score vs. Z-Score. Click through for the picture.

Read More

Categories

December 2018
MTWTFSS
« Nov Jan »
 12
3456789
10111213141516
17181920212223
24252627282930
31