Traversing Foreign Keys Using Biml

Kevin Feasel

2016-09-28

Biml, ETL

Ben Weissman has a two-part series on loading a set of tables based on foreign key constraints.  Part 1 is linear loads:

All our previous posts were running data loads in parallel, ignoring potential foreign key constraints. But in real life scenarios, your datawarehouse may actually have tables refering to each other using such, meaning that it is crucial to create and populate them in the right order.

In this blog post, we’ll actually solve 2 issues at once: We’ll provide a list of tables, will then identify any tables that our listed tables rely on (recursively) and will then create and load them in the right order.

In this sample, we’ll use AdventureWorksDW2014 as our source and transfer the FactInternetSales-table as well as all tables it is connected to through foreign key constraints. Eventually, we will create all these tables including the constraints in a new database AdventureWorksDW2014_SalesOnly (sorting them so we get no foreign key violations) and eventually populate them with data.

Part 2 is parallel loads:

After the first excitment about how easy it actually was to take care of that topology, you might ask yourself: Why does it have to run linear? That takes way too long. And you’re right – and it doesn’t have to.

All we need to do is:

– Create a list of all the tables that we’ve already loaded (which will be empty at that point)
– Identify all tables that do not reference any other tables
– Load these tables, each followed by all tables that only reference this single table – recursively and add them to list of loaded tables
– Once that is done, load all tables that are referencing multiple tables where all required tables have been loaded before – and again, add them to the list
– Repeat this until no table is left to load (or for a maximum of 10 times in this example)
– If, for whichever reason, any tables are left, load them sequentially using the TopoSort function:

This is a very interesting way of using Biml to traverse the foreign key tree.  I’ve normally used recursive CTEs in T-SQL to do the same, but I’ll have to play around with this method.

Related Posts

Azure Data Factory and Schema Drift

Mark Kromer walks us through two techniques we can use in Azure Data Factory to deal with schema drift: Azure Data Factory’s Mapping Data Flows have built-in capabilities to handle complex ETL scenarios that include the ability to handle flexible schemas and changing source data. We call this capability “schema drift“. When you build transformations […]

Read More

Troubleshooting AWS Database Migration Service Errors

Samir Behara takes us through troubleshooting AWS Database Migration Service issues: For troubleshooting any issues with AWS DMS, it is necessary to have logs enabled. The DMS logs would typically give a better picture and helps find errors or warnings that would indicate the root cause of the failure. If the logs are not available […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930