Giovanni Lanzani has a post with the controversial title above:
Some data is gathered, given to data scientists, and — after two weeks — the first demo takes place. The results are promising, but they need a bit more time.
Fine. After all, the data was messy: they had to clean it up and go back to the source a couple of times.
Two weeks pass and the new results are even nicer. With 70% accuracy, they can predict if a patient will go home after their visit to the emergency room.
This is much better than random (50%)! A full-fledged pilot starts.
They are faced with a couple of challenges to go from model to data product:
-
How to send the source data to the model is unclear;
-
Where the model should run;
-
The hospital operations need to change, as the intake happens with pen and paper;
-
They realize that without knowing to which department the patient will go, they won’t add any value;
-
To predict the department, the model need the diagnosis. But once the diagnosis gets typed in the computer, the patient has reached their destination: the model is useless!
I think it’s a fair point: it’s easy from the standpoint of internal researchers to look for things which they can do, but which don’t have much business value. The risk on the other side is that you’ll start diving into a high-potential-value problem and then realize that the data isn’t there to draw conclusions or that the relationships you expected simply aren’t there.