Press "Enter" to skip to content

Category: Integration Services

Against Visual Programming Languages

Ian Hellstrom has a critique of visual programming languages for data engineers:

Anyone with a software development background who has ever dealt with visual ETL tools may have marvelled at the lack of proper version control and diff tools that go with it. Some tools come with their own built-in VCS, while others allow you to use any or no VCS at all. The difficulty lies in the fact that the visual representation is often stored as an XML (or JSON) file. So, if a box is moved by 1 pixel, the file is different. You could argue that it’s indeed different because the layout is different, but you could equally make the case that the logic has not changed. This argument is moot though: it is technically possible to ensure that the tool auto-aligns blocks and routes/colours arrows, very much like yEd does (via menu items). Some users may not be happy with the reduced control over the way the flow looks, but others may rejoice that version control has become usable.

ETL (and ORM) tools often auto-generate code that is not particularly tuned for the data source in question. I have encountered many odd nested loops where simple hash joins would have been more appropriate if only the predicates had been pushed down properly (and if only the tool had evaluated blocks lazily). Aggregations and timestamp-based filters are also often a cause for performance issues. Again, performance is technically solvable, so this may be a valid argument against visual tools in data engineering now but perhaps not tomorrow.

This is a good argument against VPLs, although there are a couple of good arguments for VPLs, including how it’s easier to see if the overall architecture of a flow looks correct.  In the end, I like the compromise that Biml offers Integration Services developers:  write code but visualize results.

Comments closed

SSISDB Management

Andy Leondard describes steps you can take to maintain the SSIS Catalog:

Back It Up

As with all SQL Server database, please back up SSISDB. What follows is a (very) basic guide describing one simple method to backup your SSISDB database. Please, please, please learn more about SQL Server backup and restore options and their implications before backing up an SSISDB database in your enterprise. Feel free to use the steps I describe on your laptop or a virtual machine. And please remember…

Backups are useless. Restores are priceless. Conduct practice Disaster Recovery exercises in which you restore databases and then test functionality. You’ll be glad you did. Here is a link containing Microsoft’s advice on restoring the SSISDB database in SQL Server 2016.

The advice is pretty similar to what you’d expect for any other database, but there are a couple twists around SSISDB functionality, so do read on.

Comments closed

Finding Destinations In SSISDB

Bill Fellows has a script to figure out the name of that table throwing errors upon insertion:

There is a rich set of tables and views available in the SSISDB that operate as a flight recorder for SSIS packages as they execute. Markus Ehrenmüller (t) had a great question in Slack. In short, can you figure out what table is being used as a destination and I took a few minutes to slice through the tables to see if I could find it.

If it’s going to be anywhere, it looks like you can find it in catalog.event_message_context

If someone is using an OLE DB Destination and uses “Table or view” or “Table or View – fast load” settings, the name of the table will be the event message_context table. If they are using a variable name, then it’s going to be trickier.

Read on for the script.

Comments closed

SSIS Perfmon Counters

Lonny Niederstadt notes that you cannot see SSIS counters in Perfmon without administrative rights:

A colleague and I were hoping to review SSIS perfmon counters on a VM.  We use a logman command with a counters file to log perfmon to csv.

Opened up the csv that was captured on the VM… there were all of my typical SQL Server counters… but the following SSIS counters were missing.

\SQLServer:SSIS Service\SSIS Package Instances
\SQLServer:SSIS Pipeline\Buffer memory
\SQLServer:SSIS Pipeline\Buffers in use
\SQLServer:SSIS Pipeline\Buffers spooled
\SQLServer:SSIS Pipeline\Flat buffer memory
\SQLServer:SSIS Pipeline\Flat buffers in use
\SQLServer:SSIS Pipeline\Private buffer memory
\SQLServer:SSIS Pipeline\Private buffers in use
\SQLServer:SSIS Pipeline\Rows read
\SQLServer:SSIS Pipeline\Rows written

Huh.

The more you know.

Comments closed

Biml And Metadata

Ben Weissman provides an example of using metadata to drive conditional data loading:

Now that we’ve defined connections, databases and schemas we still need to add our table metadata.

We’re going to do that by looping across all our databases marked as a source in Biml, retrieving the list of required tables from SQL (located in View vMyBimlMeta_Tables) and creating a table tag for each table which will also reference back to the corresponding target system. That also allows us to use the same table names multiple times. Again, we’ll store some additional data in annotations.

This is an interesting concept.  Check it out.

Comments closed

Deploying SSIS Packages In VS 2015

Neil Gelder notes that you can deploy different versions of SSIS packages using Visual Studio 2015:

For years I’ve dream’t of having one set of tools for developing SSIS packages! not a lot to ask really and  great step towards this from Microsoft was decoupling the development IDE from the main SQL Server install to produce the standalone SSDT (SQL Server data tools)

But like most people I work in an environment which has legacy versions for SQL Server in production, but equally like most tech folk (giddy kids wanting new toys) I always try and use the most current and exciting  version of VS.  This however proves a problem when developing for SSIS, for example if you developed a SSIS package in VS 2013 you’d not be able to deploy this correctly to a SQL Server 2012 version of Integration services catalog.  In the past this resulted in having two IDE’s installed, SSDT 2012 (VS shell) for any 2012 catalog development and VS 2013 installed for other work.

I had one person mention during a talk I gave that this isn’t foolproof, but my experience (limited to SQL Server 2012 and 2014) was that deployment worked fine.  As always, test before making changes.

Comments closed

Issues With SSISDB In An Availability Group

Andrea Allred has some lessons learned from a troublesome service pack upgrade:

Here are a few of the fun errors that we saw.

“Script level upgrade for database ‘master’ failed because upgrade step ‘SSIS_hotfix_install.sql’ encountered error 942, state 4, severity 25. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the ‘master’ database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.”

There are some good lessons here.

Comments closed

Reverse Engineering SSIS Packages

Ben Weissman shows how to use BimlOnline to reverse engineer an Integration Services package into its component Biml:

A few things to be aware of:

– Your file will be uploaded to and stored at BimlOnline so you may want to remove passwords etc.
– If you’re trying to figure out how to build a specific task in Biml but your file does way more that just that, consider creating (and uploading) a file that will only contain the task you’re looking for – this will keep the resulting Biml clean and easy to read.

This is extremely helpful for figuring out how to use third-party components with Biml.  If you want a local IDE, there’s always BimlStudio (which costs money).

Comments closed

RetainSameConnection

I explain what the RetainSameConnection property on an Integration Services connection does:

My co-worker had set up a dynamic connection (see Rafael Salas or Hari Bagra for details on how to do this), but something weird was happening:  the package was trying to push everything to the same server.  I confirmed that if all relevant customers loaded were for the same server, that the process would work correctly, and that I could run each server load one at a time, so there weren’t any problems connecting to particular servers or parameters overriding this choice.  It’s like the connection was “sticky,” connecting successfully to the first server and then ignoring the later changes.

RetainSameConnection allows for certain benefits, but has specific limitations.  Click through to see those details.

Comments closed