Downsides Of Logistic Regression

Vincent Granville points out a few flaws in logistic regression:

I recently read a very popular article entitled 5 Reasons “Logistic Regression” should be the first thing you learn when becoming a Data Scientist. Here I provide my opinion on why this should no be the case.

It is nice to have logistic regression on your resume, as many jobs request it, especially in some fields such as biostatistics. And if you learned the details during your college classes, good for you. However, for a beginner, this is not the first thing you should learn. In my career, being an isolated statistician (working with marketing guys, sales people, or engineers) in many of my roles, I had the flexibility to choose which tools and methodology to use. Many practitioners today are in a similar environment. If you are a beginner, chances are that you would use logistic regression as a black-box tool with little understanding about how it works: a recipe for disaster.

Read on for his reasons.  I’m not totally convinced, but he does lay out his argument clearly.

Related Posts

Calculating TF-IDF Using Apache Spark

Arseniy Tashoyan shows us how to calculate Term Frequency-Inverse Document Frequency using Apache Spark: TF-IDF is used in a large variety of applications. Typical use cases include: Document search. Document tagging. Text preprocessing and feature vector engineering for Machine Learning algorithms. There is a vast number of resources on the web explaining the concept itself […]

Read More

Using The Azure Data Science VM With GPUs

Jennifer Marsman has some tips and tricks around using the Azure Data Science Virtual Machine on an instance running with GPU support: To get GPU support, you need both hardware with GPUs in a datacenter, as well as the right software – namely, a virtual machine image that includes GPU drivers so you can use […]

Read More


May 2018
« Apr Jun »