Data Cleanup Using Drools

Kevin Feasel

2017-07-24

Data

Rathnadevi Manivannan gives an example of using Drools to create rule-based data cleansing processes:

The oil well drilling datasets contain raw information about wells and their formation details, drill types, and production dates. The Arkansas dataset has 6,040 records and the Oklahoma dataset has 2,559 records.

The raw data contains invalid values such as null, invalid date, invalid drill type, and duplicate well and invalid well information with modified dates.

This raw data from the source is transformed to MS SQL for further filtering and normalization. To download raw data, look at the Reference section.

This is an example of applying several constraints and rules to a single data set.  Each individual rule would probably be easier to do in T-SQL, but the whole bunch becomes easier to understand with a procedural language.

Related Posts

Master Data In Azure

Matt How explains why Master Data Services isn’t a great cloud-based master data management solution and offers up an alternative: Excel is easy to use, but not user friendly Excel is on nearly every desktop in any Windows based organisation and with the Master Data Services Add-in, it puts the data well within the reach […]

Read More

Regular Expression Cheat Sheets

Kevin Feasel

2017-09-21

Data, R

Mara Averick shows off a collection of regular expression guides: There are helpful string-related R packages 📦, stringr (which is built on top of the more comprehensive stringi package) comes to mind. But, at some point in your computing life, you’re gonna need to get down with regular expressions. And so, here’s a collection of some of the Regex-related […]

Read More

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31