Press "Enter" to skip to content

Category: Integration Services

Purging the SSIS Catalog

Peter Schott extends Tim Mitchell’s work:

I wrote about cleaning up the SSIS Catalog a while back, but needed to revisit this recently for a different use and needed something that can run in an ongoing manner. My earlier post still works, but I recently adapted some code from Tim Mitchell to create a stored procedure that can do that cleanup. Tim wrote the majority of this. I adapted it to wrap it in a stored procedure to handle varying batch sizes for the deletes as well as to ensure all of the tables are deleted in smaller sets of rows.

Here is the code to create the stored procedure. If your SSIS Catalog is not named SSISDB, adjust accordingly.

Click through for the script. It’s interesting to note how frequently cleanup processes for functionality in SQL Server is inadequate for the task at scale. I’ve regularly seen people write these sorts of things for SSISDB, Query Store, replication, ML Services (though that, at least, was changed), etc.

Comments closed

SSIS Code Promotion

Andy Leonard takes us through the process of migrating code from development through to production:

Developers need to able to develop software that will execute enterprise operations.
Production is solely managed by operations personnel. Allow very little, if any, developer access to Production.
Before deploying to Production, operations personnel need a Pre-Production environment they can use to test the deployment and performance after the deployment. No one wants operations personnel – or anyone, really – deploying a process to Production without a practice run.
Similarly, developers need to move their code from the Development tier (aka the “works on my machine” tier) to another tier – such as Test – so they can identify hard-coded defaults that should be parameters.

Read on for some tips from Andy, including where the SSIS Catalog Compare product can fit into this.

Comments closed

Exporting Environment Variables from the SSIS Catalog

Koen Verbeeck shows how we can pull environment variables out from the SSIS catalog:

Sometimes when you’ve created a SSIS catalog, along with folders, environments and environment variables, you want this content on another server. Unfortunately, you can only script out these objects in SSMS at the moment you’re creating them (thus right before you clikc ‘OK’ to confirm). Once the objects are created, you can no longer script them out using the GUI.

Luckily, there’s a whole slew of stored procedures and views in the catalog that allow us to extract information and use that information to re-create the objects. Here are two scripts that I found useful.

Click through for the scripts.

Comments closed

Automating Collection of SSIS Failure Data

Rob Sewell has a love for automation:

I have tried my best at all times to follow this advice in the last decade and pretty much I am happy that I have managed it.

– I use PowerShell (a lot!) to automate all sorts of routine tasks including migrating this blog
– I use Jupyter Notebooks to enable myself and others to automate Run Books, Training, Documentation, Demonstrations, Incident Response. You can find my notebooks here
– I use Azure DevOps to automate infrastructure creation and changes with terraform and delivery of changes to code as well as unit testing.
– I use GitHub actions to create this blog, publish the ADSNotebook module
– I use Chocolatey to install and update software
– I have used Desired State Configuration to ensure that infrastructure is as it is expected to be

At every point I am looking for a means to automate the thing that I am doing because it is almost guaranteed that there will be a time in the future after you have done a thing that there will be a need to do it again or to do it slightly differently.

Click through to see what Rob has for us in the way of collecting SSIS log data after job failures.

Comments closed

Data Lineage and SSIS

Aveek Das has a two-parter. First up is a discussion of data lineage:

In this article, I am going to explain what Data Lineage in ETL is and how to implement the same. In this modern world, where companies are dealing with a humongous amount of data every day, there also lies a challenge to efficiently manage and monitor this data. There are systems that generate data every second and are being processed to a final reporting or monitoring tool for analysis. In order to process this data, we use a variety of ETL tools, which in turn makes the data transformation possible in a managed way.

While transforming the data in the ETL pipeline, it has to go through multiple steps of transformations in order to achieve the final result. For example, when the ETL receives the raw data from the source, there may be operations applied to it like filtering, sorting, merging, or splitting two columns, etc. There can also be aggregations or other calculations made on this raw data before finally moving into a data warehouse or preparing it for reporting. In order to be able to detect what the source of a particular record is, we need to implement something known as Data Lineage. It is a piece of simple metadata information that helps us detect gaps in the data processing pipeline and enables us to fix issues later.

Part two covers data lineage with SQL Server Integration Services:

In this article, I am going to discuss SSIS data lineage concepts, which are often used while designing ETL workloads on a data warehouse. Although this article is focused on implementing data lineage using SSIS, it does not only confine to SSIS but to any ETL tools in the market using which data is moved from one source to a destination. In my previous article, Understanding Data Lineage in ETL, I have already discussed the generic importance of data lineage concepts for any ETL tool. I would definitely suggest you have a look at it if you want to understand in general how data lineage helps to track the source of a single record in the warehouse.

If you’re fairly new to this world, it’s a good introduction to an important topic.

Comments closed

Skipping SSIS Package Validation

Andy Leonard takes us through a new feature in Visual Studio:

A new feature in version 3.9 of the Visual Studio 2019 SSIS extension is “skip package validation.” I’ve tested the functionality and it works well. Here’s how you set it up:

Click through to see how easy it is to enable. I’d imagine that this would work really well when dealing with dynamic packages where connection details aren’t available until runtime. Waiting for validation to fail to connect to a remote data source (on the UI thread!) was always annoying.

Comments closed

Performance Tuning SSIS Data Flows

Mark Broadbent reviews a SQLBits talk:

Yes before you say it, I know SQL Server Integration Services is “old technology” but a lot of people are still using it, and in many cases are either still developing against it, or are looking to integrate/ migrate with other burgeoning technologies such as Azure Data Factory. In other words, if you are not currently using SSIS then this post is probably not for you -otherwise read on.

If you are still one of the lucky ones to still be using SSIS, I thought it would be worth publishing these comprehensive notes taken from a session titled “SSIS Data Flow Performance Tuning” delivered at SQLBits 8 (Brighton) by the then “SSIS guru” Jamie Thomson. Notes have timings (in mins and seconds) against them, which correlate directly with the presentation times. The video is still available and can be downloaded from the SQLBits website so you can watch it (if required) and use the timings to follow along.

It’s an asynchronous watch party with Mark.

Comments closed

Reusing a Recordset Stored as an SSIS Object Variable

Tim Mitchell appeases the masses:

A few years back, I wrote a blog post about using an SSIS object variable as a data flow source. In that post, I described how you could load a set of query results into an object-typed variable in SQL Server Integration Services and then use that in-memory data as a source within a data flow. In the comments and the feedback I got on that post, the same question kept coming up: what is the process for reusing a recordset in an SSIS object variable in the same package?

In this post, I’ll show how you can modify the scripts within your SSIS package to allow reprocessing of the same set of results in an object variable.

Tim has a nice workaround for the problem, so check it out.

Comments closed

Comparing Integration Services and Azure Data Factory

Tim Mitchell compares SQL Server Integration Services to Azure Data Factory:

For the better part of 15 years, SQL Server Integration Services (SSIS) has been the go-to enterprise extract-transform-load (ETL) tool for shops running on Microsoft SQL Server. More recently, Microsoft added Azure Data Factory (ADF) to its stable of enterprise ETL tools. In this post, I’ll be comparing SSIS and Azure Data Factory to share how they are alike and how they differ. I’ll also review the strengths and shortcomings of each, including the architectures in which each of these is likely to do well.

Read on for Tim’s thoughts on the subject. Tim lays out his biases up-front but also gives you a good feel for where both products are in their lifecycles.

Comments closed