Wasting Money With Data Science

Giovanni Lanzani has a post with the controversial title above:

Some data is gathered, given to data scientists, and — after two weeks — the first demo takes place. The results are promising, but they need a bit more time.

Fine. After all, the data was messy: they had to clean it up and go back to the source a couple of times.

Two weeks pass and the new results are even nicer. With 70% accuracy, they can predict if a patient will go home after their visit to the emergency room.

This is much better than random (50%)! A full-fledged pilot starts.

They are faced with a couple of challenges to go from model to data product:

  • How to send the source data to the model is unclear;

  • Where the model should run;

  • The hospital operations need to change, as the intake happens with pen and paper;

  • They realize that without knowing to which department the patient will go, they won’t add any value;

  • To predict the department, the model need the diagnosis. But once the diagnosis gets typed in the computer, the patient has reached their destination: the model is useless!

I think it’s a fair point:  it’s easy from the standpoint of internal researchers to look for things which they can do, but which don’t have much business value.  The risk on the other side is that you’ll start diving into a high-potential-value problem and then realize that the data isn’t there to draw conclusions or that the relationships you expected simply aren’t there.

Related Posts

Exploratory Data Analysis with inspectdf

Laura Ellis continues a dive into Exploratory Data Analysis, this time using the inspectdf package: I like this package because it’s got a lot of functionality and it’s incredibly straightforward to use. In short, it allows you to understand and visualize column types, sizes, values, value imbalance & distributions as well as correlations. Better yet, […]

Read More

Non-Linear Classifiers with Support Vector Machines

Rahul Khanna continues a series on support vector machines: In this blog post, we will look at a detailed explanation of how to use SVM for complex decision boundaries and build Non-Linear Classifiers using SVM. The primary method for doing this is by using Kernels. In linear SVM we find margin maximizing hyperplane with features […]

Read More


September 2018
« Aug Oct »