Sit a spell and let Grandpa Andy tell yall a story about some data integratin’.
Suppose for a minute that you’ve read and taken my advice about writing small, unit-of-work SSIS packages. I wouldn’t blame you for taking this advice. It’s not only online, it’s written in a couple books (I know, I wrote that part of those books). One reason for building small, function-y SSIS packages is that it promotes code re-use. For example, SSIS packages that perform daily incremental loads can be re-used to perform monthly incremental loads to a database that serves as a data mart by simply changing a few parameters.
Change the parameter values and the monthly incremental load can load both quarterly and yearly data marts.
You want better performance out of the daily process, so you read and implement the parallel execution advice* you’ve found online. For our purposes let’s assume you’ve designed a star schema instead of one of those pesky data vaults (with their inherent many-to-many relationships and the ability to withstand isolated and independent loads and refreshes…).
You have dependencies. The dimensions must be loaded before the facts. You decide to manage parallelism by examining historical execution times. Since you load data in chronological order and use a brute-force change detection pattern, the daily dimension loads always complete before the fact loads reach the latest data. You decide to fire all packages at the same time and your daily execution time drops by half, monthly executions time drops to 40% of its former execution time, and everyone is ecstatic…
…until the quarterly loads.
This is a great post.
The main part of the UI is a graphical map of the objects in a package. I have spent quite a lot of time trying to think of a better way of showing the elements that make up a package, and it is hard to argue with the workflow approach, all the ETL tools I found do the same thing and visualizing packages as a series of interconnected tasks is useful but maybe not everything and the only way.
Because of all these issues I have really tried to stay away from SSIS development and looked for other solutions such as using biml to write packages or writing code in other languages but I keep coming back to the fact that the SSIS engine is great at what it does and if you have a lot of data or a lot of files/tables then writing a similar framework would be a lot of wasted effort. So, how can we use the SSIS engine but make better, more efficient use of our development time?
Click through to see a couple of screenshots as well as a link to download K-SSIS.
During the December 2017 delivery of Expert SSIS, I was asked if there is an SSIS Catalog Logging Mode that will display the value of variables. I responded that I wasn’t aware of a logging level that accomplishes this, but then – as sometimes happens – I could not let it go. It is an excellent question. So I dug in…
I second his notion that you should learn how to use script tasks and script components. They’re not very difficult to pick up and even a basic knowledge will benefit you greatly.
Looking at the times of the upgrade (it takes milliseconds) we can live with automatic version migrations during thousands of executions. So – is there any gain if we retain it?
Let’s take a closer look at the SSIS Toolbox. We are migrating to SSIS 2017 from the lower version, let’s say the source is SSIS 2012. Open SQL Server Data Tools (for Visual Studio 2015 or 2017, does not matter for now) and load your project. I will use SSDT for VS 2017 with sample project created for SSIS 2012. See the SSIS Toolbox for the project in version SSIS 2012? There is a Script Task following an FTPTask.
I will upgrade the SSIS project to the latest version (and write more about it in few lines) and take a look at the SSIS Toolbox now.
Now we can see additional tasks for Hadoop. Upgrading the project does at least two things that are interesting to us: it uses the latest versions of the tasks and components, but also introduces the new elements to use.
There are some benefits, but those come with a little bit of risk.
I recently worked on a project where a client wanted to use Biml to create SSIS packages to stage data from Dynamics 365 CRM. My first attempt using a script component had an error, which I think is related to a bug in the Biml engine with how it currently generates script components, so I had to find a different way to accomplish my goal. (If you have run into this issue with Biml, please comment so I know it’s not just me! I have yet to get Varigence to confirm it.) This client owned the Pragmatic Works Task Factory, so we used the Dynamics CRM source to retrieve data.
Meagan has the code as well as some important notes, so read the whole thing.
The general steps are
Set up your source query.
Pass the data through a Lookup for your Dimension with the missing results routed to a “No Match” output.
Insert those “No Match” rows into your Dimension using a SQL task – checking to make sure that this particular row hasn’t already been inserted (this is important).
Do another lookup using a “Partial Cache” to catch these newly-inserted members.
Use a UNION ALL transform to bring the existing and late-arriving members together.
Click through for more information and a helpful package diagram.
Take note here that I am only querying the msdb database. There is nothing exceedingly top secret here – yet. Most DBAs should be extremely familiar with these tables and functions that I am using here.
What does this show me though? If I have a package that is being run via Agent Job in msdb, then the sensitive information needs to be decrypted somehow. So, in order to do that decryption the password needs to be passed to the package. As it turns out, the password will be stored in the msdb database following the “DECRYPT” switch for the dtutil utility. Since I happen to have a few of these packages already available, when I run this particular query, I will see something like the following in my results.
That’s a clever solution. I get the feeling that I should be a bit perturbed by how simple this is, but I don’t; the real sensitive data is still secure.
Wolfgang Strasser has started a series on the new scale-out functionality in SQL Server Integration Services 2017. First, his introduction:
In the past, SSIS package executions were only able to run on the server that hosted the Integration Services server itself. With the rising number and requirements of more and more package executions sometimes the resources on the server ran short. Addressing this resource shortage custom scale out functionality was implemented that allowed package executions to be transfered to other “worker” machines in order to distribute execution load. With SQL Server 2017, this functionality is built into an shipped with SSIS 2017.
Before I am diving deeper into SSIS Scale Out I would like to discuss some basic vocabulary in the field of scalability.
The master is managing the available workers and all the work that is requested for execution in the scale out topoloy.
The master manages a list of (active) workers
The master gets the instructions from clients
The master knows the current state of work (queued jobs, running jobs, finished jobs, ..)
If you’re familiar with other distributed computing systems, this follows a similar path.
Ben Weissman has a pair of posts regarding metadata models in Biml. First up, he gives us the high-roller solution:
If you’re lucky enough to be a BimlStudio user, you have access to the Biml Metadata feature! This feature allows you to build a Metadata model that fits your exacts needs which can then be browsed and used through a Metadata Instance using a dynamic object model.
As you probably still want to maintain your metadata outside of BimlStudio, we’ve build this little piece of code. It will ready your meta-Schema from a given SQL Database and build a Biml Metadata-Model from it. In a second step, it will also import the contents of your model into an instance:
If your company doesn’t want to shell out the cash to buy a license for BimlStudio, Ben also has a version for people using the free BimlExpress tool:
So maybe you’ve noticed our blog post on deriving metadata from SQL Server into BimlStudio, but you are using BimlExpress and therefore don’t have access to the feature described in there? While it’s true, that BimlExpress doesn’t support the Metadata features of the Biml language, there are similar ways of achieving a flexible metadata model in Biml.
This post shows you, how you can build a model in SQL Server, import it to Biml including derived relationships etc. and use it in a very flexible way.
To get started, we need to set up a sample model in SQL Server first. You can either download the scripts from https://solisyon.de/files/SetupDerivedMetadata.sql or scroll to the very end of that page. Although your individual model can and will differ from this one, we suggest you follow along using our example and start tweaking it afterwards!
Once you really get how Biml converts metadata to packages, life gets so much easier.
The first file that we’re going to look at is the [Content_Types].xml file, and this is the file that confirms that the ZipPackage class is used. There’s an article here that is ten years old but is still valid (scroll down to the System.IO.Packaging INcludes Zip Support to read up on this.) This is because we know that the content_types file is part of the output when using the ZipPackage class to zip up a bunch of files into a .zip. The content_file contains both the extension and content type of the three other files that are included in the ispac:
Note that the content_types file does not specify the files, either in quantity or in content, other than the fact that they will contain xml.
Read on for a good amount of detail on what’s included in an Integration Services package.