Parallel Execution With SSIS Framework Community Edition

Andy Leonard has a great post on parallel execution with SSIS:

Sit a spell and let Grandpa Andy tell yall a story about some data integratin’.

Suppose for a minute that you’ve read and taken my advice about writing small, unit-of-work SSIS packages. I wouldn’t blame you for taking this advice. It’s not only online, it’s written in a couple books (I know, I wrote that part of those books). One reason for building small, function-y SSIS packages is that it promotes code re-use. For example, SSIS packages that perform daily incremental loads can be re-used to perform monthly incremental loads to a database that serves as a data mart by simply changing a few parameters.

Change the parameter values and the monthly incremental load can load both quarterly and yearly data marts.

You want better performance out of the daily process, so you read and implement the parallel execution advice* you’ve found online. For our purposes let’s assume you’ve designed a star schema instead of one of those pesky data vaults (with their inherent many-to-many relationships and the ability to withstand isolated and independent loads and refreshes…).

You have dependencies. The dimensions must be loaded before the facts. You decide to manage parallelism by examining historical execution times. Since you load data in chronological order and use a brute-force change detection pattern, the daily dimension loads always complete before the fact loads reach the latest data. You decide to fire all packages at the same time and your daily execution time drops by half, monthly executions time drops to 40% of its former execution time, and everyone is ecstatic…

…until the quarterly loads.

This is a great post.

Related Posts

Databricks versus Mapping Data Flows

Helge Rege Gardsvoll contrasts Azure Databricks, Azure Data Factory Mapping Data Flows, and SQL Server Integration Services: Mapping Data FlowsOne of the many data flows from Microsoft these days providing, for the first time, data transformation capabilities within Data Factory. This is not a U-SQL script or Databricks notebook that is orchestrated from Data Factory, […]

Read More

SSIS Design Preferences

Meagan Longoria systematizes a set of preferences regarding Integration Services package and ETL process design: – Every table should have InsertDateTime and UpdateDateTime columns. The UpdateDateTime column should be populated with the same value as the InsertDateTime column upon creation of the row, rather than being left null.– Whatever you use to create tables, include […]

Read More

Categories

February 2018
MTWTFSS
« Jan Mar »
 1234
567891011
12131415161718
19202122232425
262728