Press "Enter" to skip to content

Category: Integration Services

Data Lineage and SSIS

Aveek Das has a two-parter. First up is a discussion of data lineage:

In this article, I am going to explain what Data Lineage in ETL is and how to implement the same. In this modern world, where companies are dealing with a humongous amount of data every day, there also lies a challenge to efficiently manage and monitor this data. There are systems that generate data every second and are being processed to a final reporting or monitoring tool for analysis. In order to process this data, we use a variety of ETL tools, which in turn makes the data transformation possible in a managed way.

While transforming the data in the ETL pipeline, it has to go through multiple steps of transformations in order to achieve the final result. For example, when the ETL receives the raw data from the source, there may be operations applied to it like filtering, sorting, merging, or splitting two columns, etc. There can also be aggregations or other calculations made on this raw data before finally moving into a data warehouse or preparing it for reporting. In order to be able to detect what the source of a particular record is, we need to implement something known as Data Lineage. It is a piece of simple metadata information that helps us detect gaps in the data processing pipeline and enables us to fix issues later.

Part two covers data lineage with SQL Server Integration Services:

In this article, I am going to discuss SSIS data lineage concepts, which are often used while designing ETL workloads on a data warehouse. Although this article is focused on implementing data lineage using SSIS, it does not only confine to SSIS but to any ETL tools in the market using which data is moved from one source to a destination. In my previous article, Understanding Data Lineage in ETL, I have already discussed the generic importance of data lineage concepts for any ETL tool. I would definitely suggest you have a look at it if you want to understand in general how data lineage helps to track the source of a single record in the warehouse.

If you’re fairly new to this world, it’s a good introduction to an important topic.

Comments closed

Skipping SSIS Package Validation

Andy Leonard takes us through a new feature in Visual Studio:

A new feature in version 3.9 of the Visual Studio 2019 SSIS extension is “skip package validation.” I’ve tested the functionality and it works well. Here’s how you set it up:

Click through to see how easy it is to enable. I’d imagine that this would work really well when dealing with dynamic packages where connection details aren’t available until runtime. Waiting for validation to fail to connect to a remote data source (on the UI thread!) was always annoying.

Comments closed

Performance Tuning SSIS Data Flows

Mark Broadbent reviews a SQLBits talk:

Yes before you say it, I know SQL Server Integration Services is “old technology” but a lot of people are still using it, and in many cases are either still developing against it, or are looking to integrate/ migrate with other burgeoning technologies such as Azure Data Factory. In other words, if you are not currently using SSIS then this post is probably not for you -otherwise read on.

If you are still one of the lucky ones to still be using SSIS, I thought it would be worth publishing these comprehensive notes taken from a session titled “SSIS Data Flow Performance Tuning” delivered at SQLBits 8 (Brighton) by the then “SSIS guru” Jamie Thomson. Notes have timings (in mins and seconds) against them, which correlate directly with the presentation times. The video is still available and can be downloaded from the SQLBits website so you can watch it (if required) and use the timings to follow along.

It’s an asynchronous watch party with Mark.

Comments closed

Reusing a Recordset Stored as an SSIS Object Variable

Tim Mitchell appeases the masses:

A few years back, I wrote a blog post about using an SSIS object variable as a data flow source. In that post, I described how you could load a set of query results into an object-typed variable in SQL Server Integration Services and then use that in-memory data as a source within a data flow. In the comments and the feedback I got on that post, the same question kept coming up: what is the process for reusing a recordset in an SSIS object variable in the same package?

In this post, I’ll show how you can modify the scripts within your SSIS package to allow reprocessing of the same set of results in an object variable.

Tim has a nice workaround for the problem, so check it out.

Comments closed

Comparing Integration Services and Azure Data Factory

Tim Mitchell compares SQL Server Integration Services to Azure Data Factory:

For the better part of 15 years, SQL Server Integration Services (SSIS) has been the go-to enterprise extract-transform-load (ETL) tool for shops running on Microsoft SQL Server. More recently, Microsoft added Azure Data Factory (ADF) to its stable of enterprise ETL tools. In this post, I’ll be comparing SSIS and Azure Data Factory to share how they are alike and how they differ. I’ll also review the strengths and shortcomings of each, including the architectures in which each of these is likely to do well.

Read on for Tim’s thoughts on the subject. Tim lays out his biases up-front but also gives you a good feel for where both products are in their lifecycles.

Comments closed

Stopping an ADF SSIS Integration Runtime

Andy Leonard shows how to shut off the Azure Data Factory SSIS integration runtime when you don’t need to use it:

I have blogged about stopping an Azure-SSIS IR in the past (Start Azure-SSIS, Execute Package, Stop Azure-SSIS). The pattern described in the Start Azure-SSIS, Execute Package, Stop Azure-SSIS post is sound and awesome. My only issue is the pipeline will report failure if the Azure-SSIS IR is not running when the Web Activity executes the Stop command. Mind you, the operation succeeds. But it reports failure.

Reporting failure when the operation succeeds seriously messes with my CDO (that’s OCD with the letters in the proper order).

Read on to see how to do this the best way.

Comments closed

Using the SSIS Hadoop Components

Hadi Fadlallah walks us through the HDFS file source and destination components:

To test these components, we will create an SSIS package and add three connection managers:

1. Hadoop Connection Manager: to connect with the Hadoop cluster (check the previous article)
2. OLE DB Connection Manager: to connect to SQL Server instance where AdventureWorks2017 database is stored
3. Flat File Connection Manager: We will use it to export data from HDFS Source:

I wonder if they ever fixed the 4K screen resolution problem (kind of tells you how often I use SSIS anymore…). That was one of the things which made these components unusable for me on any modern screen.

Comments closed

Deleting Packages from the SSIS Catalog

Mala Mahadevan performs important cleanup work:

I will be blogging on a few things I learn on my journey to SSIS expertise. This is the first one. This came about as a result of wanting to delete a few packages from ssis catalog. We do not use these packages any more, and I wanted to clean them out from the project which resides on a few servers. I looked into a few ways of doing it.

Click through for three methods, including an in-depth discussion of the third (and least obstructive).

Comments closed

Scheduling SSIS Packages in Azure

Magi Naumova takes us through the process of running SSIS in Azure Data Factory, including the scheduling of jobs to run our SSIS packages:

The main purpose of these tools is to force the Lift and Shift approach of migrating and running existing SSIS Packages in Azure. I wouldn’t say that this is the most effective approach of transferring the ETL to Azure, but it could be a good start on a road of a Modern Azure Datawarehouse Architecture. If you have already deployed SSIS packages in Azure SSIS Catalog, then SSMS 18 helps you to put them on schedule very quickly.

Running SSIS Packages in Azure requires provisioning of SSIS Runtime Engine, an Azure Data Factory instance and a SQL Database which hosts the SSIS catalog. Scheduling SSIS Packages in Azure requires creating a data flow pipeline in ADF which has a trigger defined for scheduled execution. While describing all those concepts is far above the scope of this chapter, a short description would be useful.

Read on for a good amount of detail and a demo which walks through the process.

Comments closed

Formatting SSIS Packages

Tim Mitchell shows us different options available when formatting SSIS packages:

Most folks I know go to one extreme or the other when it comes to the visual layout of SQL Server Integration Services packages: either they don’t care a tiny bit about the appearance, or they insist on an easy-to-read layout before a project is considered complete. I am definitely in the second group, and will almost always spend the time to make sure my packages are properly laid out.

Regardless of which of these groups you are in, it is very easy to apply some simple visual formatting to your packages. In this quick tip, I’ll show you how to use the visual formatting options in SSIS to automate the layout.

I consider it quite important—it gives a future viewer an idea of your vision as developer. If I can’t read what you’re doing, that does not bode well.

Comments closed