Scientific Debt

David Robinson gives us the data science analog to technical debt:

In my new job as Chief Data Scientist at DataCamp, I’ve been thinking about the role of data science within a business, and discussing this with other professionals in the field. On a panel earlier this year, I realized that data scientists have a rough equivalent to this concept: “scientific debt.”

Scientific debt is when a team takes shortcuts in data analysis, experimental practices, and monitoring that could have long-term negative consequences. When you hear a statement like:

  • “We don’t have enough time to run a randomized test, let’s launch it”
  • “To a first approximation this effect is probably linear”
  • “This could be a confounding factor, but we’ll look into that later”
  • “It’s directionally accurate at least”

you’re hearing a little scientific debt being “borrowed”.

Read the whole thing.  I strongly agree with the premise.

Related Posts

Building an Image Classifier with PyTorch

Rogier van der Geer shows how you can use PyTorch to build out a Convolutional Neural Network for image classification: The tool that we are going to use to make a classifier is called a convolutional neural network, or CNN. You can find a great explanation of what these are right here on wikipedia. But we […]

Read More

xgboost and Small Numbers of Subtrees

John Mount covers an interesting issue you can run into when using xgboost: While reading Dr. Nina Zumel’s excellent note on bias in common ensemble methods, I ran the examples to see the effects she described (and I think it is very important that she is establishing the issue, prior to discussing mitigation).In doing that I ran into one more […]

Read More


May 2018
« Apr Jun »