Traversing Foreign Keys Using Biml

Kevin Feasel

2016-09-28

Biml, ETL

Ben Weissman has a two-part series on loading a set of tables based on foreign key constraints.  Part 1 is linear loads:

All our previous posts were running data loads in parallel, ignoring potential foreign key constraints. But in real life scenarios, your datawarehouse may actually have tables refering to each other using such, meaning that it is crucial to create and populate them in the right order.

In this blog post, we’ll actually solve 2 issues at once: We’ll provide a list of tables, will then identify any tables that our listed tables rely on (recursively) and will then create and load them in the right order.

In this sample, we’ll use AdventureWorksDW2014 as our source and transfer the FactInternetSales-table as well as all tables it is connected to through foreign key constraints. Eventually, we will create all these tables including the constraints in a new database AdventureWorksDW2014_SalesOnly (sorting them so we get no foreign key violations) and eventually populate them with data.

Part 2 is parallel loads:

After the first excitment about how easy it actually was to take care of that topology, you might ask yourself: Why does it have to run linear? That takes way too long. And you’re right – and it doesn’t have to.

All we need to do is:

– Create a list of all the tables that we’ve already loaded (which will be empty at that point)
– Identify all tables that do not reference any other tables
– Load these tables, each followed by all tables that only reference this single table – recursively and add them to list of loaded tables
– Once that is done, load all tables that are referencing multiple tables where all required tables have been loaded before – and again, add them to the list
– Repeat this until no table is left to load (or for a maximum of 10 times in this example)
– If, for whichever reason, any tables are left, load them sequentially using the TopoSort function:

This is a very interesting way of using Biml to traverse the foreign key tree.  I’ve normally used recursive CTEs in T-SQL to do the same, but I’ll have to play around with this method.

Related Posts

When Data Factory Flows Don’t

Emma Stewart points out an issue that might vex newcomers to Azure Data Factory: The data within the Data Lake store was organised into a Year and Month hierarchy for the folders, and each days transactions were stored in a file which was named after the day within the relevant month folder. The task then […]

Read More

Generating Task Factory Dynamics CRM Loads With Biml

Meagan Longoria shows how to use Biml to generate SSIS packages which use the Task Factory Dynamics CRM source: I recently worked on a project where a client wanted to use Biml to create SSIS packages to stage data from Dynamics 365 CRM. My first attempt using a script component had an error, which I […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930