Press "Enter" to skip to content

Category: Integration Services

Validating SSIS Packages Using T-SQL

Annie Xu shows us how to validate SSIS packages in the SSISDB catalog using T-SQL:

Recently, I need to do a data warehouse migration for a client. Since there might be some difference between the Dev environment source databases and Prod environment source databases. The migrated SSIS packages for building data warehouse might have some failures because of the changes. So the challenge is how can I validate all my DW packages (100 +) all at once.

Click through for the script.

Comments closed

Contrasting Integration Services And Pentaho Data Integration

Koen Verbeeck contrasts SQL Server Integration Services with Pentaho Data Integration:

For generating SSIS packages, you need to rely on Biml (much about that can be found on this blog or on the net), or older frameworks such as ezApi. Or you need 3rd party tools such as BimlStudio or TimeXtender. Using Biml means writing XML and .NET. Don’t get me wrong, I love Biml and I use it a lot in my SSIS projects.

But generating transformations in PDI is so much easier. First, you create a template (you create a transformation, but you leave certain fields empty, such as the source SQL statement and the destination table). Then you have another transformation reading metadata. This metadata is pushed to the template using the Metadata Injection Transformation. In this transformation, you point to the template and you map those empty fields to your metadata fields.

It’s interesting to see where each product stands out or falls flat compared to the other, and Koen’s comparison is definitely not a one-sided bout.

Comments closed

Using Biml To Read Excel Files Without Excel

Bill Fellows follows up on his prior post and shows how you can write BimlScript to parse an Excel file without having Microsoft Office installed:

My resources are quite simple: Excel Spreadsheet containing meta data, a driver program and a package template.

The template is your standard truncate and reload pattern with the target table being specified by a parameter. The client validates data by running processes in parallel so the existing mainframe process delivers data to the Billing table while ours delivers to a Billing_NEW table. Once they accept the new process, the target table becomes Billing and the NEW table is dropped. I decided the most native SSIS route would be use specify the target table in as a parameter. We originally have a boolean parameter indicating whether we were loading the new table or the production one but that was more logic and overhead that just specifying which table to load. I force their queries to be dirty reads as some of these queries can be rather messy.

Click through for the script and explanation.

Comments closed

Azure Data Factory Or Integration Services?

Teo Lachev contrasts use cases for Integration Services vesus Azure Data Factory V2:

So, ADF was incorrectly positioned as “SSIS for the Cloud” and unfortunately once that message made it out there was a messaging problem that Microsoft has been fighting ever since. Like Azure ML, on the glory road to the cloud things that were difficult with SSIS (installation, projects, deployment) became simple, and things that were simple became difficult. Naturally, Microsoft took a lot of criticism from the customers and community, including from your humble correspondent. ADF, or course, has nothing to do with SSIS, thus leaving many data integration practitioners with a difficult choice: should you take the risk and take the road less traveled with ADF, or continue with the tried-and-true SSIS for data integration on Azure?

To Microsoft’s credits, ADF v2 has made significant enhancements in features, usability, and maintainability. There is an also a “lift and shift” option to run SSIS inside ADF but since this architecture requires a VM, I consider it a narrow case scenario, such as when you need to extend ADF with SSIS features that it doesn’t have. Otherwise, why would you start new development with SSIS hosted under ADF, if you could provision and license the VM yourself and have full control over it?

All in all, Teo is not the biggest fan of ADF at this point and leans heavily toward SSIS; read on for the reasoning.

Comments closed

Azure Data Factory V2 Dependencies

Meagan Longoria has important notes on how Azure Data Factory V2 Dependencies differ from SQL Server Integration Services precedent constraints:

This sounds similar to SSIS precedence constraints, but there are a couple of big differences.

  1. SSIS allows us to define expressions to be evaluated to determine if the next task should be executed.
  2. SSIS allows us to choose whether we handle multiple constraints as a logical AND or a logical OR. In other words, do we need all constraints to be true or just one.

ADF V2 activity dependencies are always a logical AND. While we can design control flows in ADF similar to how we might design control flows in SSIS, this is one of several differences. Let’s look at an example.

Meagan gives us three methods of replicating SSIS functionality using ADF V2, so check it out.

Comments closed

An Update To ssisUnit

Bartosz Ratajczyk has added some functionality to ssisUnit:

Second – you can get and set the properties of the project and its elements. Like – overwriting project connection managers (I designed it with this particular need on my mind). You can now set the connection string the different server (or database) – in the PropertyPath of the PropertyCommand use \Project\ConnectionManagers, write the name of the connection manager with the extension, and use one of the Properties. You can do it during the Test setup (or all tests setup), but not during the test suite setup, as ssisUnit is not aware of the project until it loads it into the memory.

Good on Bartosz for resurrecting a stable but moribund project and adding some enhancements.

Comments closed

Azure Data Factory V2 Or SSIS?

Merrill Aldrich explains the differences between Azure Data Factory V2 and SQL Server Integration Services:

The next consideration is a bit more involved if you are new to data integration. Both of these tools excel at transporting data from place to place, but they have important differences in terms of what you can do to modify the data in transit. As a matter of emphasis, ADF has more features geared toward moving the data than performing any complex transformation along the way. SSIS, on the other hand, was built with a large library of transformations that you can chain together to make elaborate data flows including lookups, matching, splitting data, and more.

The tools also overlap quite a lot. In projects this seems to lead to the question of whether you’ll transform the data “in flight” using Extract Transform Load (ETL), or instead move the data to a destination where it’ll be transformed using Extract Load Transform (ELT).

These are not “pretty much the same thing” and Merrill does a good job of explaining what those differences in design mean for the products.

Comments closed

Getting Anatomical With SSIS Catalog Compare

Andy Leonard shows off parts of SSIS Catalog Compare.  First up is the catalog reference script:

As you may glean from this analysis of one script generated for SSIS Catalog References management, the Transact-SQL for scripting SSIS Catalog artifacts in SSIS Catalog Compare is rigorous and includes several existence and error-condition checks prior to performing any updates. The script is designed to be idempotent, as well, meaning the script will succeed and the results will be repeatable and predictable each time the script is executed – and that the script itself is re-executable.

Then an environment script:

The final portion of the script checks for the existence of the Environment Variables and responds accordingly. This is a three-step process:

  1. Drop the Environment Variable if it exists.
  2. Create the Environment Variable.
  3. Set the Environment Variable value.

If the Environment Variable exists the script drops it. Why? SSIS Catalog Compare wants to be sure the environment variable is created with the proper data type and initial values.

And connection literals:

These messages are intended to be copied and stored in the Notes field of a ticketing system in a DevOps enterprise. Note the detail contained herein:

  • Script Name – the path to the file used to perform the operation.

  • Generated From – the SQL Server instance of the SSIS Catalog host from which the script was generated.

  • Catalog Name – redundant at present because all SSIS Catalogs are named “SSISDB.”

  • Folder Name – the name of the SSIS Catalog Folder that contains the scripted artifact.

  • Project Name – the name of the SSIS Project that contains the scripted artifact.

  • Project Connection Name – the name of the SSIS Project Connection.

  • Generated By – the name of the enterprise account used to generate the artifact’s script.

    • Note: SSIS Catalog Compare respects the security model of the SSIS Catalog. Windows Authentication is required to perform many SSIS Catalog operations.
  • Generated Date – the date and time the script was generated.

  • Generated From – the version of CatalogBase used in the generation of the artifact script.

    • Executing On – the name of the machine on which CatalogBase was running.
  • Deployed to Instance – the SQL Server instance hosting the target SSIS Catalog.

  • Deploy Date – the date and time the deployment script was executed.

  • Deploy By – the enterprise account used to deploy the artifact script.

Andy has put a lot of thought into SSIS Catalog Compare, so go check it out.

1 Comment

When MS_SSISServerCleanupJobLogin Is Orphaned

Sreekanth Bandarla noticed a problem in cleaning up SSIS metadata:

Couple of weeks ago I was analyzing a server for space and noticed SSISDB database was abnormally huge (this Instance was running just a handful of packages). I noticed couple of internal schema tables in SSISDB were huge (with some hundreds of millions of rows), well that’s not right. There should be SSIS Server maintenance job which SQL server creates to purge older entries based on the retention settings right? My immediate action was to check the retention period set and what’s the status of the job.  As I suspected, the job was failing (looks like this has been failing since ages) with below error.

The job failed.  The Job was invoked by Schedule 9 (SSISDB Scheduler).  The last step to run was step 1 (SSIS Server Operation Records Maintenance).
Execute as Login failed for the requested login ‘##MS_SSISServerCleanupJobLogin##’

Read on for the root cause and solution.

Comments closed

Writing ssisUnit Tests With C#

Bartosz Ratajczyk shows us how to create ssisUnit tests in MSTest with C#:

In the post about using MSTest framework to execute ssisUnit tests, I used parts of the ssisUnit API model. If you want, you can write all your tests using this model, and this post will guide you through the first steps. I will show you how to write one of the previously prepared XML tests using C# and (again) MSTest.

Why MSTest? Because I don’t want to write some application that will contain all the tests I want to run, display if they pass or not. When I write the MSTest tests, I can run them using the Test Explorer in VS, using a command line, or in TFS.

UIs are great for learning how to do things and for one-off actions, but writing code scales much better in terms of time.

Comments closed