Traversing Foreign Keys Using Biml

Kevin Feasel

2016-09-28

Biml, ETL

Ben Weissman has a two-part series on loading a set of tables based on foreign key constraints.  Part 1 is linear loads:

All our previous posts were running data loads in parallel, ignoring potential foreign key constraints. But in real life scenarios, your datawarehouse may actually have tables refering to each other using such, meaning that it is crucial to create and populate them in the right order.

In this blog post, we’ll actually solve 2 issues at once: We’ll provide a list of tables, will then identify any tables that our listed tables rely on (recursively) and will then create and load them in the right order.

In this sample, we’ll use AdventureWorksDW2014 as our source and transfer the FactInternetSales-table as well as all tables it is connected to through foreign key constraints. Eventually, we will create all these tables including the constraints in a new database AdventureWorksDW2014_SalesOnly (sorting them so we get no foreign key violations) and eventually populate them with data.

Part 2 is parallel loads:

After the first excitment about how easy it actually was to take care of that topology, you might ask yourself: Why does it have to run linear? That takes way too long. And you’re right – and it doesn’t have to.

All we need to do is:

– Create a list of all the tables that we’ve already loaded (which will be empty at that point)
– Identify all tables that do not reference any other tables
– Load these tables, each followed by all tables that only reference this single table – recursively and add them to list of loaded tables
– Once that is done, load all tables that are referencing multiple tables where all required tables have been loaded before – and again, add them to the list
– Repeat this until no table is left to load (or for a maximum of 10 times in this example)
– If, for whichever reason, any tables are left, load them sequentially using the TopoSort function:

This is a very interesting way of using Biml to traverse the foreign key tree.  I’ve normally used recursive CTEs in T-SQL to do the same, but I’ll have to play around with this method.

Related Posts

Importing Biml Metadata from Excel

David Stein wraps up a series on using Biml to load flat files: Make sure you read and understand the concepts in each of the following articles before tackling this one. – Import Biml Metadata with GetQuerySchema– BimlScript Code Nuggets and Mad Libs– Import Biml Metadata Directly from Excel One of the primary issues I’ve […]

Read More

Copying Cassandra Data to HDFS

Landon Robinson shows how you can use Spark to extract data from Cassandra and move it into HDFS: Cassandra is a great open-source solution for accessing data at web scale, thanks in no small part to its low-latency performance. And if you’re a power user of Cassandra, there’s a high probability you’ll want to analyze the data […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930