Data Cleanup Using Drools

Kevin Feasel

2017-07-24

Data

Rathnadevi Manivannan gives an example of using Drools to create rule-based data cleansing processes:

The oil well drilling datasets contain raw information about wells and their formation details, drill types, and production dates. The Arkansas dataset has 6,040 records and the Oklahoma dataset has 2,559 records.

The raw data contains invalid values such as null, invalid date, invalid drill type, and duplicate well and invalid well information with modified dates.

This raw data from the source is transformed to MS SQL for further filtering and normalization. To download raw data, look at the Reference section.

This is an example of applying several constraints and rules to a single data set.  Each individual rule would probably be easier to do in T-SQL, but the whole bunch becomes easier to understand with a procedural language.

Related Posts

Everyone’s Data Is Dirty

Kevin Feasel

2017-11-16

Data

Chirag Shivalker hits the highlights on dirty data: It might sound a bit abrupt, but clean data is a myth. If your data is dirty, so is everyone else’s. Enterprises are more than dependent on data these days, and it is going to stay the same in coming years. They need to collect data in order […]

Read More

The GDPR And You

Kevin Feasel

2017-11-15

Data

William Brewer has some Q&A regarding the General Data Protection Regulation: The General Data Protection Regulation (GDPR) will affect organisations in countries around the world, not just those in Europe. The GDPR regulates how personal data is stored, moved, handled, and destroyed. Not following the regulation will lead to dire consequences for your organisation. As […]

Read More

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31