Press "Enter" to skip to content

Category: Integration Services

Clustered Columnstore Index Load With SSIS

Koen Verbeeck looks at loading a clustered columnstore index using SSIS:

I stumbled upon this MSDN blog post: SQL Server 2016 SSIS Data Flow Buffer Auto Sizing capability benefits data loading on Clustered Columnstore tables (catchy title). It explains how you can set the buffer properties of the data flow to try to insert data directly into compressed row groups instead of in the delta store. They fail to achieve this using SSIS 2014 and then they explain how using the new AutoAdjustBufferSize property of SSIS 2016 works miracles and everything is loaded directly into compressed row groups. Hint: you want to avoid loading data into the delta store, as it is row storage and you need to wait for the  tuple mover to load the data to the CCI in the background.

However, it’s still possible to achieve the same using SSIS 2014 (or earlier). Niko Neugebauer (blog |twitter) shows this in his post Clustered Columnstore Indexes – part 51 (“SSIS, DataFlow & Max Buffer Memory”). It still depends on the estimated row size, but using these settings you should get better results:

This advice is a bit different from loading standard rowstore-based tables, but serves to pack as many rows into each columnstore row group as possible.

Comments closed

Echoing Variable Values

Bill Fellows shows how to send informational messages in SSIS and gives an example in Biml:

To aid in debugging, it’s helpful to have a “flight recorder” running to show you the state of variables. When I was first learning to program, the debugger I used was a lot of PRINT statements. Verify your inputs before you assume anything is a distillation of my experience debugging.

While some favor using MessageBox, I hate finding the popup window, closing it and then promptly forgetting what value was displayed. In SSIS, I favor raising events, FireInformation specifically, to emit the value of variables. This approach allows me to see the values in both the Progress/Execution Results tab as well as the Output window.

There’s value in putting in code like this as part of generic processing.  Flip the debug bit to true whenever you need this detailed information.  You can also think about calling the method multiple times, such as before and after an expected change block.

Comments closed

SSIS RC2 Issues

Andy Leonard walks through his experience trying SSIS 2016 RC2:

The SSMS 2016 RC2 link works just fine. The SSDT 2016 RC2 link does not work.

Fear not! The SSIS team has provided a set of updated links for SSIS 2016 SSDT for RC2. There’s other good information in that post. If you want to tinker with SSIS 2016 RC2, I encourage you to read it.

But Wait, There’s More

Once I’d done all this, I could create an SSIS project and add a Script Task to a package. But I could not open the Visual Studio Tools for Applications (VSTA) code editor. When I clicked the “Edit Script…” button in the Script Task Editor, nothing happened.

I contacted the SSIS Development Team (we hang out), and let them know what I was seeing. They are aware of the issue and sent the following screenshot

Sounds like there are still some kinks to work out before release.

Comments closed

Adding Multiple Packages To A Project

Koen Verbeeck shows a quick way to add multiple existing packages to a project:

You select the package you want to import and you’re done. But the problem is, you can select only one single object? What if you want to import 20 packages to your project? Kind of annoying to repeat the same process 20 times, isn’t it?

Luckily there’s an easier way. Instead of going for the obvious Add Existing Package, right-click on the project itself. In the context-menu, choose Add, ignore Existing Package and click on Existing Item.

If you have a large number of packages to import, this will save you a few minutes of tedium (or hand-editing a project file).

Comments closed

File Processing Pattern

Bill Fellows describes a pattern for processing files in an ETL scenario:

All ETL processing will use a common root node/directory. I call it SSISData to make it fairly obvious what is in there but call it as you will. On my dev machine, this is usually sitting right off C:. On servers though, oh hell no! The C drive is reserved for the OS. Instead, I work with the DBAs to determine where this will actually be located. Heck, they could make it a UNC path and my processing won’t care because I ensure that root location is an externally configurable “thing.” Whatever you’re using for ETL, I’m certain it will support the concept of configuration. Make sure the base node is configurable.

A nice thing about anchoring all your file processing to a head location is that if you are at an organization that is judicious with handing out file permissions, you don’t have to create a permission request for each source of data. Get your security team to sign off on the ETL process having full control to a directory structure starting here. The number of SSIS related permissions issues I answer on StackOverflow is silly.

It comes down to consistency and cleanliness.  Plan ahead and you’ll have a much nicer go of things.  Bill also provides a Biml POC, so check that out as well.

Comments closed

Ragged Right Files

Sifiso Ndlovu walks us through ragged right formatted files in Integration Services:

The configuration of columns is perhaps a critical part of the entire ETL process as it helps us build mapping metadata for your ETL. In fact, regardless of where or not SSIS/SSMS can detect delimiters, if you skip Column Mapping section – your ETL will fail validation. In order to clarify how Ragged right formatted files work, I have gone a step back and used Figure 4 to actually displayed a preview of our fictitious Fruits transaction dataset from Notepad++. It can already be seen from Notepad++ that the file only has row delimiter in a form of CRLF.

Read the whole thing.

Comments closed

Parameterizing Attunity Queries In SSIS

Melissa Coates shows us how to parameterize queries if you’re using the Attunity connector for Oracle in SSIS:

The Attunity connector for Oracle used inside of the SSIS data flow looks a little different than the typical OLE DB connector we most commonly use. Specifically, the Attunity source has two options: “Table Name” and “SQL command.” What the Attunity Oracle Source doesn’t have in this dialog box is “SQL command from variable” (like we see for an OLE DB source).

Expressions are your friend.

Comments closed

Failed To Deploy Project In SSIS

Ginger Grant helps resolve “Failed to deploy project” errors (Error: 27203) in Integration Services:

The message Failed to deploy project isn’t very useful, but the rest of the message is. The operation_messages view lives in SSISDB, and the operation identifier number is how to determine what the error is. Run this query, using the number provided in the error message, which in this case is 173

I’d prefer it if we actually could see the error in the dialog, but at least there’s an explanation somewhere.

Comments closed

Maintaining SSISDB

Ginger Grant shows us how to manage SSISDB:

A client asked me recently why he should back up the SSISDB database. While you can recreate everything inside of the SSISDB, it will take time and you will have to remember exactly how all of your variables were set. Restoring the backup decreases this issue and having a backup allows a server to be redeployed quickly. When you do back up the database, make sure that you remember to backup the database certificate, which is created when the SSISDB is created as well, as you will need this to do a restore. By default. the recovery model of the SSISDB is set to Full. If the packages in SSISDB are changing minute by minute, full would make sense, but given that an SSISDB contains packages which are run on a scheduled basis, most likely the changes made are infrequent. Change the recovery model to simple.

SSISDB is a real database, just like ReportServer, so don’t neglect it just because you didn’t create it.

Comments closed

Columnstore And SSIS 2016

Niko Neugebauer mentions that with SQL Server 2016 and Integration Services 2016, clustered columnstore index insertion can get much faster:

To solve the performance problem I went straight to the DefaultBufferMaxRows setting and set it to be equal of the maximum number of rows in a Row Group – 1048576. Together with the AutoAdjustBufferSize setting it helps the actual current size of the DataFlow Buffer that will be used for transferring the data from the source to the destination table.

What should I say – it worked like magic:
I guess that with 2:09 Minutes the clear winner of this test is the configuration with AutoAdjustBufferSize set to True and the DefaultBufferMaxRows to 1048576. It took less then a half of the time with just AutoAdjustBufferSize activated and the insertion process was executed with the help of the Bulk Load API – meaning that we did not have to wait for the Tuple Mover or to execute it manually.

Doubling insertion performance is nothing to scoff, especially for something like columnstore tables, where we expect millions (or more) of rows to be inserted.

Comments closed