Press "Enter" to skip to content

Enumerating Causes of Dirty and Incomplete Data

Joe Celko builds a list and checks it twice:

Many years ago, my wife and I wrote an article for Datamation, a major trade publication at the time, under the title, “Don’t Warehouse Dirty Data!” It’s been referenced quite a few times over the decades but is nowhere to be found using Google these days. The point is, if you have written a report using data, you have no doubt felt the pain of dirty data and it is nothing new.

However, what we never got around to defining was exactly how data gets dirty. Let’s look at some of the ways data get messed up.

I am very slowly working up the nerve to build a longer talk (and YouTube series) on data engineering, and part of that involves understanding why our data tends to be such a mess. Joe has several examples and stories, and I’m sure we could come up with dozens of other reasons.