The Data Science Delusion

Anand Ramanathan has a strong critique of “data science” as it stands today:

Illustration: Consider the sentiment-tagging task again. A Q1 resource uses an off-the-shelf model for movie reviews, and applies it to a new task (say, tweets about a customer service organization). Business is so blinded by spectacular charts [14] and anecdotal correlations (“Look at that spiteful tweet from a celebrity … so that’s why the sentiment is negative!”), that even questions about predictive accuracy are rarely asked until a few months down the road when the model is obviously floundering. Then too, there is rarely anyone to challenge the assumptions, biases and confidence intervals (Does the language in the tweets match the movie reviews? Do we have enough training data? Does the importance of tweets change over time?).

Overheard“Survival analysis? Never heard of it … Wait … There is an R package for that!”

This is a really interesting article and I recommend reading it.

Related Posts

Python and R Data Reshaping

John Mount takes us through a couple of data shaping packages: The advantages of data_algebra and cdata are: – The user specifies their desired transform declaratively by example and in data. What one does is: work an example, and then write down what you want (we have a tutorial on this here).– The transform systems can print what a transform is going to […]

Read More

When to Use Different ML Algorithms

Stefan Franczuk explains the different categories of machine learning algorithms available in Talend: Clustering is the task of grouping together a set of objects in such a way, that objects in the same group are more similar to each other than to those in other groups. Clustering is really useful for identify separate groups and […]

Read More


November 2016
« Oct Dec »