Diagnosing Duplicate Records

Kevin Feasel

2016-08-22

ETL

Jesse Seymour walks through his process of finding and fixing unexpected duplicate key violations:

In this case, the error message is quite clear.  There is more than one row in the source (staging) that matches a single row in the target (data warehouse).  When we are warehousing data, we setup key fields that allow us to match up a record in staging to a record in the data warehouse.  In most systems, you can use the source system’s primary key to accomplish this.  After all, most systems use a RDBMS of some sort to store data.  However, in this case the source data is from a SharePoint list, and the only source key available is a list item ID.

So why are we not using that?  There is a very simple answer and that is because end users delete old data from the list, which can lead to a recycling of ID values from SharePoint.  If an ID gets recycled, then the data warehouse will improperly overwrite data in the fact table or discard the new row as a duplicate depending on how we configure the extract routine.

Figuring out the cause of the problem is a multi-step process, as Jesse shows.

Related Posts

Passing Messages to Azure Service Bus via Data Factory

Rayis Imayev shows how we can use Logic Apps to let Azure Data Factory send messages to Azure Service Bus: Summary:1) Azure Data Factory and Service Bus can find common grounds to communicate with each other, and Azure Logic Apps could serve as a good mediator to establish this type of messaging communication.2) As soon as messages land […]

Read More

Deleting in Azure Data Factory

Meagan Longoria is happy that Azure Data Factory v2 now has a Delete activity: It is a common practice to load data to blob storage or data lake storage before loading to a database, especially if your data is coming from outside of Azure. We often create a staging area in our data lakes to hold data until […]

Read More

Categories

August 2016
MTWTFSS
« Jul Sep »
1234567
891011121314
15161718192021
22232425262728
293031