Everyone’s Data Is Dirty

Kevin Feasel



Chirag Shivalker hits the highlights on dirty data:

It might sound a bit abrupt, but clean data is a myth. If your data is dirty, so is everyone else’s. Enterprises are more than dependent on data these days, and it is going to stay the same in coming years. They need to collect data in order to analyze it, which necessarily will not be 100% clean, pristine, or perfect in nature.

Nearly all companies face the challenge of dirty data in the form of a lot of duplicates, incorrect fields, and missing values. This happens due to omnichannel data influx, followed by hundreds, if not thousands, of employees wrestling and torturing that data to derive professional outcomes and insights. Don’t forget that even the best of the data has that tendency to decay in few weeks.

The saying goes that any analytics project is about 80% data cleansing and feature extraction.  I’d say that number’s probably closer to 90-95%, and dirty data is a big part of that.

Related Posts

Execution Plans And GDPR

Grant Fritchey isn’t crazy when it comes to execution plans: Now, when you save an execution plan out to a file, you’re potentially transmitting PI data. It goes further. When you hard code values, PI is not just in the query. Those PI values can also be stored throughout the plan in various properties. So […]

Read More

The GDPR And You

Kevin Feasel



William Brewer has some Q&A regarding the General Data Protection Regulation: The General Data Protection Regulation (GDPR) will affect organisations in countries around the world, not just those in Europe. The GDPR regulates how personal data is stored, moved, handled, and destroyed. Not following the regulation will lead to dire consequences for your organisation. As […]

Read More


November 2017
« Oct Dec »