Press "Enter" to skip to content

Category: Integration Services

File Processing Pattern

Bill Fellows describes a pattern for processing files in an ETL scenario:

All ETL processing will use a common root node/directory. I call it SSISData to make it fairly obvious what is in there but call it as you will. On my dev machine, this is usually sitting right off C:. On servers though, oh hell no! The C drive is reserved for the OS. Instead, I work with the DBAs to determine where this will actually be located. Heck, they could make it a UNC path and my processing won’t care because I ensure that root location is an externally configurable “thing.” Whatever you’re using for ETL, I’m certain it will support the concept of configuration. Make sure the base node is configurable.

A nice thing about anchoring all your file processing to a head location is that if you are at an organization that is judicious with handing out file permissions, you don’t have to create a permission request for each source of data. Get your security team to sign off on the ETL process having full control to a directory structure starting here. The number of SSIS related permissions issues I answer on StackOverflow is silly.

It comes down to consistency and cleanliness.  Plan ahead and you’ll have a much nicer go of things.  Bill also provides a Biml POC, so check that out as well.

Comments closed

Ragged Right Files

Sifiso Ndlovu walks us through ragged right formatted files in Integration Services:

The configuration of columns is perhaps a critical part of the entire ETL process as it helps us build mapping metadata for your ETL. In fact, regardless of where or not SSIS/SSMS can detect delimiters, if you skip Column Mapping section – your ETL will fail validation. In order to clarify how Ragged right formatted files work, I have gone a step back and used Figure 4 to actually displayed a preview of our fictitious Fruits transaction dataset from Notepad++. It can already be seen from Notepad++ that the file only has row delimiter in a form of CRLF.

Read the whole thing.

Comments closed

Parameterizing Attunity Queries In SSIS

Melissa Coates shows us how to parameterize queries if you’re using the Attunity connector for Oracle in SSIS:

The Attunity connector for Oracle used inside of the SSIS data flow looks a little different than the typical OLE DB connector we most commonly use. Specifically, the Attunity source has two options: “Table Name” and “SQL command.” What the Attunity Oracle Source doesn’t have in this dialog box is “SQL command from variable” (like we see for an OLE DB source).

Expressions are your friend.

Comments closed

Failed To Deploy Project In SSIS

Ginger Grant helps resolve “Failed to deploy project” errors (Error: 27203) in Integration Services:

The message Failed to deploy project isn’t very useful, but the rest of the message is. The operation_messages view lives in SSISDB, and the operation identifier number is how to determine what the error is. Run this query, using the number provided in the error message, which in this case is 173

I’d prefer it if we actually could see the error in the dialog, but at least there’s an explanation somewhere.

Comments closed

Maintaining SSISDB

Ginger Grant shows us how to manage SSISDB:

A client asked me recently why he should back up the SSISDB database. While you can recreate everything inside of the SSISDB, it will take time and you will have to remember exactly how all of your variables were set. Restoring the backup decreases this issue and having a backup allows a server to be redeployed quickly. When you do back up the database, make sure that you remember to backup the database certificate, which is created when the SSISDB is created as well, as you will need this to do a restore. By default. the recovery model of the SSISDB is set to Full. If the packages in SSISDB are changing minute by minute, full would make sense, but given that an SSISDB contains packages which are run on a scheduled basis, most likely the changes made are infrequent. Change the recovery model to simple.

SSISDB is a real database, just like ReportServer, so don’t neglect it just because you didn’t create it.

Comments closed

Columnstore And SSIS 2016

Niko Neugebauer mentions that with SQL Server 2016 and Integration Services 2016, clustered columnstore index insertion can get much faster:

To solve the performance problem I went straight to the DefaultBufferMaxRows setting and set it to be equal of the maximum number of rows in a Row Group – 1048576. Together with the AutoAdjustBufferSize setting it helps the actual current size of the DataFlow Buffer that will be used for transferring the data from the source to the destination table.

What should I say – it worked like magic:
I guess that with 2:09 Minutes the clear winner of this test is the configuration with AutoAdjustBufferSize set to True and the DefaultBufferMaxRows to 1048576. It took less then a half of the time with just AutoAdjustBufferSize activated and the insertion process was executed with the help of the Bulk Load API – meaning that we did not have to wait for the Tuple Mover or to execute it manually.

Doubling insertion performance is nothing to scoff, especially for something like columnstore tables, where we expect millions (or more) of rows to be inserted.

Comments closed

Conditional Processing

Bill Fellows shows options for handling date-based conditional processing:

Do you see the problem? Really, there are two but the one I’m focused on is the use of GETDATE to determine which branch of logic is executed. Today is Monday and I need to test the logic that runs on Friday. Yes, I can run these steps in isolation and given that I’m not updating the logic that fiddles with the branches, my change shouldn’t have an adverse effect but by golly, that sucks from an testing perspective. It’s also really hard to develop unit tests when your input data is server date. What are you going to do, allocate 5 to 7 days for testing or change the server clock. I believe the answer is No and OH HELL NAH!

This isn’t just an SSIS thing, either. I’ve seen the above logic in TSQL as well. If you pin your logic to getdate/current_timestamp calls, then your testing is going to be painful.

I liken this to solving dependency injection problems in general:  make the caller define the date or date part.  That way, your test callers can define other dates and the “smarts” around which branch to take move up to a more swappable layer.

Comments closed

SSIS Performance Testing

Koen Verbeeck shows a framework he uses for performance testing in Integration Services:

The proc passes the @RunID parameter to the package, as well as other usual suspects, such as the package name, folder name and project name. You can also choose if a package is run synchronously or asynchronously. When run synchronously, the stored procedure doesn’t finish until the package is finished as well.

Using this stored procedure, it is easy to run a package multiple times in a row using a WHILE loop.

Also of interest is Andy Leonard’s SSIS Performance site, whose goal is to set up some performance benchmarks for Integration Services.

Comments closed

Lineage Improvements

Andy Leonard shows LineageId improvements over the years:

SSIS 2016 CTP3.3 offers a solution. First, there are now two new columns in the SSIS Data Flow Component Error Output – ErrorCode – Description and ErrorColumn – Description:

The new columns provide extremely useful (plain language) error metadata that will, in my opinion, greatly reduce the amount of time required to identify data-related load failures in the Data Flow Task.

But that’s not all. If you configure a log to capture the DiagnosticEx event, you will receive a message that provides the Data Flow column ID. To have a look, add a new log to an SSIS package that contains a configured Data Flow Task. On the Details tab, select the DiagnosticEx event:

Backtracing LineageId was a painful experience for me, so I’m happy that they’re making this better.

Comments closed

Copying SSIS Packages

Andy Galbraith shows how to copy SSIS packages using DTUTIL:

A frequent need when performing a server migration is to copy the SSIS packages from one server to a new server.  There are a couple of different ways to do this, including a wizard in SSMS. (See https://www.mssqltips.com/sqlservertip/2061/how-to-manage-ssis-packages-stored-in-multiple-sql-server-database-instances/).  The catch to this is that these are manual and they only move one package at a time.

I recently had to migrate a server with over twenty packages, and I knew I didn’t want to click-click-click over and over again.  🙂

The best answer would be to have your packages safe and secure in source control, but sometimes that’s not an option.

Comments closed