Press "Enter" to skip to content

Category: Integration Services

Embedding SSIS Packages in Azure Data Factory Pipelines

Andy Leonard shows us how to embed an SSIS package inside Azure Data Factory pipelines:

The Azure-SSIS Team has done it again; they’ve added more cool SSIS execution functionality to Azure Data Factory!

Click through to see what has Andy excited. I think this is a big thing for ADF as well, especially in shops which dedicated a lot of time and energy into building SSIS packages for ETL work over the years.

Leave a Comment

The SSIS Error Output

Tim Mitchell explains how to use the error output on data flow components in SQL Server Integration Services:

SSIS error outputs are a secondary path through which the data flow can send rows that do not conform to data type, length, or transformation standards defined by the ETL developer. That’s a lengthy way to say that it’s where you can send your junk data. In the SSIS designer, clicking on a source or transformation will often show not one but two possible outputs: the primary output (the “good” data, indicated by the blue line) and the error output (identified by the red line). As shown on the flat file source below, when selecting a source or transformation, those that have an available error output will appear with both output connectors ready for selection.

Tim elaborates quite a bit on what you can do with this output.

Comments closed

Documenting SSIS Catalogs

Dave Mason continues a series on documenting Integration Services:

In the last post, we looked at query options for documenting SSIS packages that are deployed via the legacy package deployment model. What about SSIS packages deployed to a catalog via the project deployment model? Is the package XML accessible so that we can query and shred it? I don’t know with certainty, but I think it is not.

However, there are .NET Framework interfaces we can use to programatically obtain SSIS catalog metadata. So for this post, I’ll soldier on without my dear friend T-SQL and proceed with PowerShell. Here’s a look at the projects and packages I was working with as I developed and debugged the script code. It’s a short list with just two projects, having one package each. The “DailyETLMain.dtsx” package is a sample from Microsoft’s GitHub repository.

Click through for Dave’s explanation and a link to the script.

Comments closed

Documenting Integration Services Packages

Dave Mason continues a quest for documentation:

The output is a tabular result showing each SSIS package, their Names and Descriptions, and the top-level task Names and Descriptions of each subplan. (In addition to the “DailyETLMain” package, we see metadata for some other plans related to the Management Data Warehouse Data Collector.) Note the 16 rows of metadata for the “DailyETLMain” package correspond to the 16 top-level objects of the package–the query doesn’t recurse into containers to obtain their object metadata. I decided not to attempt that–it seemed like overkill for documentation purposes. Another caveat: the results order may not match the order that’s mandated by Precedence Constraints in the Visual Studio designer.

Click through for the full example.

Comments closed

SSIS Term Extraction

Tim Mitchell takes us through term extraction in SQL Server Integration Services:

The SSIS term extraction component is a data flow transformation that will aggregate the frequency of words found in a specified column supplied by an upstream data source. This component expects a Unicode text or text stream field as an input, and calculates as an output the frequency of the nouns and/or noun phrases in the specified source column. As shown below on the data flow surface, the term extraction component (circled) is always used as a transformation, accepting exactly one input and expecting either one or two outputs (the second being an optional error output).

This is one component I’ve never used before.

Comments closed

SSIS Project Connections

Tim Mitchell shows how we can use project connections in SQL Server Integration Services:

In most use cases, the same connection will be used across multiple packages in the same project. In early versions of SSIS (pre-2012), each package would have its own connection manager for every connection used in that package. Creating and maintaining all those connection managers could be time-consuming as the number of packages grows. Starting with SQL Server 2012, Microsoft added project connections to SSIS, allowing for the creation of connections that were accessible across all packages in a project. Instead of having to create a copy of each connection manager in every package, developers can now simply create the connection at the project level. Project connections will automatically show up in the connection manager tray for all packages in that project.

Click through to see how you can create one and get rid of per-package connections.

Comments closed

Databricks versus Mapping Data Flows

Helge Rege Gardsvoll contrasts Azure Databricks, Azure Data Factory Mapping Data Flows, and SQL Server Integration Services:

Mapping Data Flows
One of the many data flows from Microsoft these days providing, for the first time, data transformation capabilities within Data Factory. This is not a U-SQL script or Databricks notebook that is orchestrated from Data Factory, but a tool integrated. This means that you can reuse (many of) the datasets you have defined in Data Factory, while in Databricks you don’t.

Mapping Data Flows runs on top of Databricks, but the cluster is handled for you and you don’t have to write any of that Scala code yourself.

Read on for the full comparison.

Comments closed

SSIS Design Preferences

Meagan Longoria systematizes a set of preferences regarding Integration Services package and ETL process design:

– Every table should have InsertDateTime and UpdateDateTime columns. The UpdateDateTime column should be populated with the same value as the InsertDateTime column upon creation of the row, rather than being left null.
– Whatever you use to create tables, include primary keys, foreign keys, and indexes with your table definitions. Provide explicit constraint names to simplify database comparisons. You can disable your foreign keys, but they need to be there to provide that metadata.
– Separate your final dimensional/reporting tables from audit tables and staging tables. This can be done with separate schemas or even separate databases.

People have added some more thoughts in the comments as well.

Comments closed

Maintaining SSISDB

John McCormack was in a jam:

I made 2 unsuccessful attempts at running the SSIS Server Maintenance Job. However, after several hours of processing and still no available free space in the database, I knew the job wasn’t coping with the sheer number of rows it had to delete. The deletes all happen from the parent table (internal.operations) and then all child tables using using cascading deletes. This approach maintains referential integrity but is not great for performance.

Due to this, I needed a new approach to the maintenance of SSISDB. As we hadn’t maintained these tables for 13/14 months, I was asking too much of SQL Server to let me delete everything at once. 

Read on for the solution.

Comments closed

Drawing SSIS Packages

Bartosz Ratajczyk continues a quest to draw SSIS packages as SVGs:

To get the Value and Expression properties I need to find the precedence constraint in the .dtsx file during the XSL transformations. It requires three changes in the package2svg.xsl:

– I have to pass the name of the .dtsx file
– I have to read the XML from the .dtsx file
– I have to use the DTS namespace because it’s the namespace of the .dtsx file

Read on for more. Bartosz to this point has covered the control flow.

Comments closed