Data Cleanup Using Drools

Kevin Feasel

2017-07-24

Data

Rathnadevi Manivannan gives an example of using Drools to create rule-based data cleansing processes:

The oil well drilling datasets contain raw information about wells and their formation details, drill types, and production dates. The Arkansas dataset has 6,040 records and the Oklahoma dataset has 2,559 records.

The raw data contains invalid values such as null, invalid date, invalid drill type, and duplicate well and invalid well information with modified dates.

This raw data from the source is transformed to MS SQL for further filtering and normalization. To download raw data, look at the Reference section.

This is an example of applying several constraints and rules to a single data set.  Each individual rule would probably be easier to do in T-SQL, but the whole bunch becomes easier to understand with a procedural language.

Related Posts

Finding The Real Character Set: Unicode And SQL Server Identifiers

Solomon Rutzky wraps up his series on Unicode and regular identifiers: The question that I’m trying to answer is: what are the valid “letters” and “decimal numbers” from other national scripts? I tried using the online research tool “UnicodeSet”, but that gave slightly different results compared (using the “alphabetic” and “numeric_type = decimal” properties) to […]

Read More

Execution Plans And GDPR

Kevin Feasel

2018-03-13

Data

Grant Fritchey isn’t crazy when it comes to execution plans: Now, when you save an execution plan out to a file, you’re potentially transmitting PI data. It goes further. When you hard code values, PI is not just in the query. Those PI values can also be stored throughout the plan in various properties. So […]

Read More

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31