Press "Enter" to skip to content

Category: Integration Services

Failed To Deploy Project In SSIS

Ginger Grant helps resolve “Failed to deploy project” errors (Error: 27203) in Integration Services:

The message Failed to deploy project isn’t very useful, but the rest of the message is. The operation_messages view lives in SSISDB, and the operation identifier number is how to determine what the error is. Run this query, using the number provided in the error message, which in this case is 173

I’d prefer it if we actually could see the error in the dialog, but at least there’s an explanation somewhere.

Comments closed

Maintaining SSISDB

Ginger Grant shows us how to manage SSISDB:

A client asked me recently why he should back up the SSISDB database. While you can recreate everything inside of the SSISDB, it will take time and you will have to remember exactly how all of your variables were set. Restoring the backup decreases this issue and having a backup allows a server to be redeployed quickly. When you do back up the database, make sure that you remember to backup the database certificate, which is created when the SSISDB is created as well, as you will need this to do a restore. By default. the recovery model of the SSISDB is set to Full. If the packages in SSISDB are changing minute by minute, full would make sense, but given that an SSISDB contains packages which are run on a scheduled basis, most likely the changes made are infrequent. Change the recovery model to simple.

SSISDB is a real database, just like ReportServer, so don’t neglect it just because you didn’t create it.

Comments closed

Columnstore And SSIS 2016

Niko Neugebauer mentions that with SQL Server 2016 and Integration Services 2016, clustered columnstore index insertion can get much faster:

To solve the performance problem I went straight to the DefaultBufferMaxRows setting and set it to be equal of the maximum number of rows in a Row Group – 1048576. Together with the AutoAdjustBufferSize setting it helps the actual current size of the DataFlow Buffer that will be used for transferring the data from the source to the destination table.

What should I say – it worked like magic:
I guess that with 2:09 Minutes the clear winner of this test is the configuration with AutoAdjustBufferSize set to True and the DefaultBufferMaxRows to 1048576. It took less then a half of the time with just AutoAdjustBufferSize activated and the insertion process was executed with the help of the Bulk Load API – meaning that we did not have to wait for the Tuple Mover or to execute it manually.

Doubling insertion performance is nothing to scoff, especially for something like columnstore tables, where we expect millions (or more) of rows to be inserted.

Comments closed

Conditional Processing

Bill Fellows shows options for handling date-based conditional processing:

Do you see the problem? Really, there are two but the one I’m focused on is the use of GETDATE to determine which branch of logic is executed. Today is Monday and I need to test the logic that runs on Friday. Yes, I can run these steps in isolation and given that I’m not updating the logic that fiddles with the branches, my change shouldn’t have an adverse effect but by golly, that sucks from an testing perspective. It’s also really hard to develop unit tests when your input data is server date. What are you going to do, allocate 5 to 7 days for testing or change the server clock. I believe the answer is No and OH HELL NAH!

This isn’t just an SSIS thing, either. I’ve seen the above logic in TSQL as well. If you pin your logic to getdate/current_timestamp calls, then your testing is going to be painful.

I liken this to solving dependency injection problems in general:  make the caller define the date or date part.  That way, your test callers can define other dates and the “smarts” around which branch to take move up to a more swappable layer.

Comments closed

SSIS Performance Testing

Koen Verbeeck shows a framework he uses for performance testing in Integration Services:

The proc passes the @RunID parameter to the package, as well as other usual suspects, such as the package name, folder name and project name. You can also choose if a package is run synchronously or asynchronously. When run synchronously, the stored procedure doesn’t finish until the package is finished as well.

Using this stored procedure, it is easy to run a package multiple times in a row using a WHILE loop.

Also of interest is Andy Leonard’s SSIS Performance site, whose goal is to set up some performance benchmarks for Integration Services.

Comments closed

Lineage Improvements

Andy Leonard shows LineageId improvements over the years:

SSIS 2016 CTP3.3 offers a solution. First, there are now two new columns in the SSIS Data Flow Component Error Output – ErrorCode – Description and ErrorColumn – Description:

The new columns provide extremely useful (plain language) error metadata that will, in my opinion, greatly reduce the amount of time required to identify data-related load failures in the Data Flow Task.

But that’s not all. If you configure a log to capture the DiagnosticEx event, you will receive a message that provides the Data Flow column ID. To have a look, add a new log to an SSIS package that contains a configured Data Flow Task. On the Details tab, select the DiagnosticEx event:

Backtracing LineageId was a painful experience for me, so I’m happy that they’re making this better.

Comments closed

Copying SSIS Packages

Andy Galbraith shows how to copy SSIS packages using DTUTIL:

A frequent need when performing a server migration is to copy the SSIS packages from one server to a new server.  There are a couple of different ways to do this, including a wizard in SSMS. (See https://www.mssqltips.com/sqlservertip/2061/how-to-manage-ssis-packages-stored-in-multiple-sql-server-database-instances/).  The catch to this is that these are manual and they only move one package at a time.

I recently had to migrate a server with over twenty packages, and I knew I didn’t want to click-click-click over and over again.  🙂

The best answer would be to have your packages safe and secure in source control, but sometimes that’s not an option.

Comments closed

Single-Package Deployments In SSIS 2016

Andy Leonard looks at what happens when your first SSIS project deployment is a single-package deployment:

To test, I created a new SSIS project named “DeploymentTest1” and added three simple SSIS packages. I right-clicked Package3.dtsx and clicked Deploy Package. The Integration Services Deployment Wizard started, as expected. Since I was deploying to a pristine Catalog, I created a new Catalog Folder named “Deployment”. There were no projects in my catalog, and I was curious how the Deployment Wizard would handle this.

Andy notes that this works, but you might want to stick to the tried-and-true method of deploying entire projects and naming your catalog project the same as your SSDT project.

Comments closed

Incremental Loading Using Datetime Columns

Reza Rad shows a pattern for implementing incremental loads using modified dates:

The idea behind this method is to store the latest ETL run time in a config or log table, and then in the next ETL run just load records from the source table that have modified (with their modified date greater than or equal to) after the latest ETL run datetime. This will create the change set for the data table. The change set might contains inserted, updated, or deleted records. to identify which change happened on the record you need to compare the change set with existing records and separate inserted, updated, and deleted records. This change set with the action on each record can be inserted into staging tables, and then be used to apply on the fact table based on appropriate action.

In my experience, the hardest part about this is making sure people update ModifiedTime when they update rows in the table.

Comments closed

MSDTC + SSIS

Kenneth Fisher discusses how to get MSDTC working with Integration Services:

tl;dr; The MSDTC service has to be not only turned on, but configured on all of the machines involved. Including the machine running the SSIS package (possibly a workstation).

Configuring remote MSDTC, sure.  Configuring local MSDTC, though, is something I hadn’t realized was important, at least if you want to use SSIS transactions.  Probably goes to show how often I use SSIS transactions…

Comments closed