Press "Enter" to skip to content

Category: Integration Services

Migrating SSIS to Azure Data Factory

Koen Verbeeck has some articles for us:

For quite some time now, there’s been the possibility to lift-and-shift your on-premises SSIS project to Azure Data Factory. There, they run in an Integration Runtime, a cluster of virtual machines that will execute your SSIS packages. In the beginning, you only had the option to use the project deployment model and host your SSIS catalog in either an Azure SQL DB, or in a SQL Server Managed Instance.

But over time, features were added and now the package deployment model has been supported for quite some time as well. Even more, the “legacy SSIS package store” is also supported. For those who still remember this, it’s the SSIS service where you can log into with SSMS and see which packages are stored in the service (either the file system or the MSDB database) and which are currently running.

Read on for much more detail on the topic.

Comments closed

Avoid Backup-and-Restore of SSISDB for Deployment

Andy Leonard recommends not using backup-and-restore as an approach of moving SSIS packages around:

First, please do not misunderstand. You should back up SSISDB just like you back up all other databases – especially in Production. You should also conduct Disaster Recovery exercises in which you restore SSISDB from the latest backup, or avail yourself of Always On availability groups and / or Windows Server Failover Clustering.

With that caveat in mind, read on to see why.

Comments closed

Purging the SSIS Catalog

Peter Schott extends Tim Mitchell’s work:

I wrote about cleaning up the SSIS Catalog a while back, but needed to revisit this recently for a different use and needed something that can run in an ongoing manner. My earlier post still works, but I recently adapted some code from Tim Mitchell to create a stored procedure that can do that cleanup. Tim wrote the majority of this. I adapted it to wrap it in a stored procedure to handle varying batch sizes for the deletes as well as to ensure all of the tables are deleted in smaller sets of rows.

Here is the code to create the stored procedure. If your SSIS Catalog is not named SSISDB, adjust accordingly.

Click through for the script. It’s interesting to note how frequently cleanup processes for functionality in SQL Server is inadequate for the task at scale. I’ve regularly seen people write these sorts of things for SSISDB, Query Store, replication, ML Services (though that, at least, was changed), etc.

Comments closed

SSIS Code Promotion

Andy Leonard takes us through the process of migrating code from development through to production:

Developers need to able to develop software that will execute enterprise operations.
Production is solely managed by operations personnel. Allow very little, if any, developer access to Production.
Before deploying to Production, operations personnel need a Pre-Production environment they can use to test the deployment and performance after the deployment. No one wants operations personnel – or anyone, really – deploying a process to Production without a practice run.
Similarly, developers need to move their code from the Development tier (aka the “works on my machine” tier) to another tier – such as Test – so they can identify hard-coded defaults that should be parameters.

Read on for some tips from Andy, including where the SSIS Catalog Compare product can fit into this.

Comments closed

Exporting Environment Variables from the SSIS Catalog

Koen Verbeeck shows how we can pull environment variables out from the SSIS catalog:

Sometimes when you’ve created a SSIS catalog, along with folders, environments and environment variables, you want this content on another server. Unfortunately, you can only script out these objects in SSMS at the moment you’re creating them (thus right before you clikc ‘OK’ to confirm). Once the objects are created, you can no longer script them out using the GUI.

Luckily, there’s a whole slew of stored procedures and views in the catalog that allow us to extract information and use that information to re-create the objects. Here are two scripts that I found useful.

Click through for the scripts.

Comments closed

Automating Collection of SSIS Failure Data

Rob Sewell has a love for automation:

I have tried my best at all times to follow this advice in the last decade and pretty much I am happy that I have managed it.

– I use PowerShell (a lot!) to automate all sorts of routine tasks including migrating this blog
– I use Jupyter Notebooks to enable myself and others to automate Run Books, Training, Documentation, Demonstrations, Incident Response. You can find my notebooks here
– I use Azure DevOps to automate infrastructure creation and changes with terraform and delivery of changes to code as well as unit testing.
– I use GitHub actions to create this blog, publish the ADSNotebook module
– I use Chocolatey to install and update software
– I have used Desired State Configuration to ensure that infrastructure is as it is expected to be

At every point I am looking for a means to automate the thing that I am doing because it is almost guaranteed that there will be a time in the future after you have done a thing that there will be a need to do it again or to do it slightly differently.

Click through to see what Rob has for us in the way of collecting SSIS log data after job failures.

Comments closed

Data Lineage and SSIS

Aveek Das has a two-parter. First up is a discussion of data lineage:

In this article, I am going to explain what Data Lineage in ETL is and how to implement the same. In this modern world, where companies are dealing with a humongous amount of data every day, there also lies a challenge to efficiently manage and monitor this data. There are systems that generate data every second and are being processed to a final reporting or monitoring tool for analysis. In order to process this data, we use a variety of ETL tools, which in turn makes the data transformation possible in a managed way.

While transforming the data in the ETL pipeline, it has to go through multiple steps of transformations in order to achieve the final result. For example, when the ETL receives the raw data from the source, there may be operations applied to it like filtering, sorting, merging, or splitting two columns, etc. There can also be aggregations or other calculations made on this raw data before finally moving into a data warehouse or preparing it for reporting. In order to be able to detect what the source of a particular record is, we need to implement something known as Data Lineage. It is a piece of simple metadata information that helps us detect gaps in the data processing pipeline and enables us to fix issues later.

Part two covers data lineage with SQL Server Integration Services:

In this article, I am going to discuss SSIS data lineage concepts, which are often used while designing ETL workloads on a data warehouse. Although this article is focused on implementing data lineage using SSIS, it does not only confine to SSIS but to any ETL tools in the market using which data is moved from one source to a destination. In my previous article, Understanding Data Lineage in ETL, I have already discussed the generic importance of data lineage concepts for any ETL tool. I would definitely suggest you have a look at it if you want to understand in general how data lineage helps to track the source of a single record in the warehouse.

If you’re fairly new to this world, it’s a good introduction to an important topic.

Comments closed

Skipping SSIS Package Validation

Andy Leonard takes us through a new feature in Visual Studio:

A new feature in version 3.9 of the Visual Studio 2019 SSIS extension is “skip package validation.” I’ve tested the functionality and it works well. Here’s how you set it up:

Click through to see how easy it is to enable. I’d imagine that this would work really well when dealing with dynamic packages where connection details aren’t available until runtime. Waiting for validation to fail to connect to a remote data source (on the UI thread!) was always annoying.

Comments closed

Performance Tuning SSIS Data Flows

Mark Broadbent reviews a SQLBits talk:

Yes before you say it, I know SQL Server Integration Services is “old technology” but a lot of people are still using it, and in many cases are either still developing against it, or are looking to integrate/ migrate with other burgeoning technologies such as Azure Data Factory. In other words, if you are not currently using SSIS then this post is probably not for you -otherwise read on.

If you are still one of the lucky ones to still be using SSIS, I thought it would be worth publishing these comprehensive notes taken from a session titled “SSIS Data Flow Performance Tuning” delivered at SQLBits 8 (Brighton) by the then “SSIS guru” Jamie Thomson. Notes have timings (in mins and seconds) against them, which correlate directly with the presentation times. The video is still available and can be downloaded from the SQLBits website so you can watch it (if required) and use the timings to follow along.

It’s an asynchronous watch party with Mark.

Comments closed