Wow, that’s a lot! Let’s break it down.
Our Connections collection has a single entity in it, an OLE DB Connection named Adventureworks (remember, all of this is case sensitive so this Adventureworks is a different beast from AdventureWorks, ADVENTUREWOKRS, etc). This provides enough information to make a database connection. Of note, we have the server and catalog/database name defined in there. Depending on the type of connection used will determine the specific name used i.e. Initial Catalog & Data Source; Server & Database, etc. Look at ConnectionStrings.com if you are really wanting to see how rich (horrible) this becomes.
There’s a lot of XML to describe a single table, but a key benefit to Biml is that you write templates and scripts to generate this stuff rather than typing it out.
How often do you need to play audio while you’re compiling your Biml packages? Never? Really? Huh, just me then. Very well, chalk this blog post as one to show you that you really can do *anything* in Biml that you can do in C#.
When I first learned how I can play audio in .NET, I would hook the Windows Media Player dll and use that. The first thing I then did was create an SSIS package that had a script task which played the A-Team theme song while it ran. That was useless but a fun demo. Fast forward to using Biml and I could not for the life of me get the Windows Media Player to correctly embed in a Biml Script Task. I suspect it’s something to do with the COM bindings that Biml doesn’t yet support. Does this mean you shouldn’t use Biml – Hell no. It just means I’ve wandered far into a corner case that doesn’t yet have support.
Read on because it will make you a better person.
This post uses objects and annotations from our previous post “Export to Flatfiles with Biml”. Please use the code from that post as a prerequisit.
In the previous post, we’ve exported the whole database to flatfiles with one file per table. But what if we want to split large tables into multiple files? One easy way to do that would be to retrieve the data using OFFSET-FETCH NEXT from SQL Server.
Read on for more.
In our next step, we loop through all tables in that database (feel free to limit the results by playing with GetDatabaseSchema) and create a FlatFileFormat for each of them. We will include all columns except those with datatype Binary or Object. As flatfiles don’t really care about actual data formats, we will just define every column as a string with maximum length. We will also add an annotation with the table’s original name, the list of columns as well as a list of primary keys (we’ll need the latter for a later step :)):
Like most Biml-related things, it’s not that many lines of code, so check it out.
For each member of that collection, we follow some simple rules:
– Our table’s original name is the name of the table in the staging area without our connectionname prefix
– If our tablename still includes an underscore, we will split the name and assign the table- and schemaname respectively. Otherwise, our schema will be DBO.
– Create a DELETE statement towards our metadata store
– Create an INSERT statement towards our metadata store
Admittedly, I would have seen this as a one-time process and would have just written some scripts against sys.tables and sys.columns to generate this metadata, but “one-time processes” tend to happen over and over.
This little piece of Biml will check all your tables for indices sharing the same columns.
It does not generate any SSIS tasks etc. but might be a good starting point to build your own Index-Monitoring or Index-Clinic – because Biml is NOT just for SSIS
Depending upon your definition of a duplicate index, this might generate false positives. Regardless, it’s a nice way of showing that Biml is about more than SSIS.
Using tooling is always a trade-off between time/frustration and monetary cost. BIDS Helper/BimlExpress are free so you’re prioritizing cost over all others. And that’s ok, there’s no judgement here. I know what it’s like to be in places where you can’t buy the tools you really need. One of the hard parts about debugging the expanded Biml from BimlScript is you can’t see the intermediate or flat Biml. You’ve got your Metadata, Biml and BimlScript and a lot of imagination to think through how the code is being generated and where it might be going wrong. That’s tough. Even at this point where I’ve been working with it for four years, I can still spend hours trying to track down just where the heck things went wrong. SPOILER ALERT It’s the metadata, it’s always the metadata (except when it’s not). I end up with NULLs where I don’t expect it or some goofball put the wrong values in a field. But how can you get to a place where you can see the result? That’s what this post is about.
It’s a trivial bit of code but it’s important. You need to add a single Biml file to your project and whenever you want to see the expanded Biml, prior to it being translated into SSIS packages, right click on the file and you’ll get all that Biml dumped to a file. This recipe calls for N steps.
This is a good tip and has helped me a few times in the past.
As I recently got asked for it in a talk, this piece of code gives you all the Views in a database that are currently broken.
This could be useful for “what if”-scenarios when playing with your metadata.
Click through for the code. This is another in Ben’s enjoyable ongoing series of non-ETL things you can do with Biml.
There is no attribute in the Connections collection to assign a guid. It’s simply not there. If you want to associate an Id with an instance of a Connection your choices are the Project node and the Package node. Since we’re dealing with project level connection managers, we best cover both bases to ensure Ids synchronize across our project. If you wish, you could have embedded this Projects node in with the Connections but then you’d have to statically set these Ids. I feel like showing off so we’ll go dynamic.
To start, I define a list of static GUID values in the beginning of my file. Realistically, we have these values in a table and we didn’t go with “known” values. The important thing is that we will always map a guid to a named connection manager. If you change a connection manager’s definition from being project level to non, or vice versa, this will result in the IDs shifting and you’ll see the same symptoms as above.
There’s plenty of code over on Bill’s site to help you as well.
Ben Weissman has a two-part series on loading a set of tables based on foreign key constraints. Part 1 is linear loads:
All our previous posts were running data loads in parallel, ignoring potential foreign key constraints. But in real life scenarios, your datawarehouse may actually have tables refering to each other using such, meaning that it is crucial to create and populate them in the right order.
In this blog post, we’ll actually solve 2 issues at once: We’ll provide a list of tables, will then identify any tables that our listed tables rely on (recursively) and will then create and load them in the right order.
In this sample, we’ll use AdventureWorksDW2014 as our source and transfer the FactInternetSales-table as well as all tables it is connected to through foreign key constraints. Eventually, we will create all these tables including the constraints in a new database AdventureWorksDW2014_SalesOnly (sorting them so we get no foreign key violations) and eventually populate them with data.
After the first excitment about how easy it actually was to take care of that topology, you might ask yourself: Why does it have to run linear? That takes way too long. And you’re right – and it doesn’t have to.
All we need to do is:
– Create a list of all the tables that we’ve already loaded (which will be empty at that point)
– Identify all tables that do not reference any other tables
– Load these tables, each followed by all tables that only reference this single table – recursively and add them to list of loaded tables
– Once that is done, load all tables that are referencing multiple tables where all required tables have been loaded before – and again, add them to the list
– Repeat this until no table is left to load (or for a maximum of 10 times in this example)
– If, for whichever reason, any tables are left, load them sequentially using the TopoSort function:
This is a very interesting way of using Biml to traverse the foreign key tree. I’ve normally used recursive CTEs in T-SQL to do the same, but I’ll have to play around with this method.