Press "Enter" to skip to content

Category: Integration Services

High-Performance ETL via Buffer Table

Daniel Hutmacher needs things to zoom:

It’s almost like a myth – one that I’ve heard people talk about, but never actually seen myself. The “shock absorber” is a pretty clever data flow design pattern to ingest data where a regular ETL process would choke on the throughput or spikes. The idea is to use a buffer table to capture incoming data, and then run an asynchronous process that loads that data in batches from the buffer into its intended target table.

While I’ve seen whitepapers and blog posts mention the concept loosely along with claims of “7x or 10x performance”, none of them go into technical detail on how it’s done, so I decided to try my hand at it.

I’ve compiled my findings, along with some pre-baked framework code if you want to try building something yourself. Professional driver on closed roads. It’s gonna get pretty technical.

Combine that with Eitan Blumin’s post yesterday and you’d think it were buffer week.

This shock absorber pattern works well for warehouse loading, especially when you’re trickle-loading data into columnstore indexes and don’t want to have open rowgroups slowing everything down.

Comments closed

VS_NEEDSNEWMETADATA in SSIS

Hadi Fadlallah discusses what was the bane of my existence for about 3 months in 2010:

In this article, we will briefly explain the VS_NEEDSNEWMETADATA SSIS exception, one of the most popular exceptions that an ETL developer may face while using SSIS. Then, we will run an experiment that reproduces this error. Then, we will show how we can fix it.

This was really annoying prior to SQL Server 2008 (at least, that’s my early-morning recollection of when the SSIS engine started trying to auto-fix this) and has been mildly annoying since. I had far too many conversations which I could summarize as “Yes, I understand that this Excel spreadsheet is basically the same, but it’s different in that the casing on one header column has changed slightly and that breaks the entire system.

Comments closed

IDENTITY Overflow in SSIS

Alex Stuart hits a weird error:

Conversion/overflow errors aren’t that unusual – normally a data flow broken by some unexpected data (“no, there’s no chance that field would ever have a character in it”), or perhaps a column hitting max size (“INT will be enough for years, like, 5 years. I’ll have left the company by then”)

But that wasn’t the case here – the package and user tables involved were checked by the dev team and there was no possible overflow. I’d checked system databases for maxed-out identity columns and found nothing. Heads were scratched.

Read on for the post-head-scratch answer.

Comments closed

SSIS Framework Manager Community Edition

Andy Leonard has a new product announcement:

I’m excited to announce SSIS Framework Manager CE (Community Edition) is available for download at DILM Suite! SSIS Framework Manager CE is designed to support SSIS Framework Community Edition, providing a GUI to facilitate SSIS Framework Application creation, configuration, and management.

Three views are supported in this initial edition: Catalog, Application, and Package. The Catalog view incorporates the same Catalog treeview used in SSIS Catalog Browser (also free) and SSIS Catalog Compare (not free):

Click through to see what’s included.

Comments closed

Azure Data Factory Integration Runtimes

Tino Zishiri takes us through the concept of the Integration Runtime:

An Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory to provide data integration capabilities such as Data Flows and Data Movement. It has access to resources in either public networks, or hybrid scenarios (public and private networks).

Read on to learn more about what they do and the variety of Integration Runtimes available to you.

Comments closed

Creating Sequence Diagrams for SSIS Packages

Aveek Das has an idea for documentation:

In this article, I am going to explain in detail how to document SSIS packages using Sequence Diagrams and the importance of these diagrams in the field of software engineering, no matter which programming language are you using. In my previous article, I have talked about the various UML Diagrams that are being used to document various software engineering processes. Also, I have talked about modular ETL architecture and how to create such a modular package in SSIS. Sequence diagrams are also a part of the broader UML Diagrams which define the interaction between the various components in the system in a chronological manner.

My gut feeling is that this works best with medium-sized collections of packages, where we’re talking 10-30 or so packages in total, and that for something much larger, I’d want an automated tool to build diagrams for me. But I could be way off base on that.

Comments closed

Global Parameters in SSIS Framework

Andy Leonard has an update for us:

I’m happy to announce the latest version of our SSIS Framework includes global parameters! I can hear some of you thinking, …

“What Are Global Parameters, Andy?”

I’m so glad you asked! SSIS ships with package-scoped and project-scoped parameters. Project-scoped parameters may be used in any SSIS package in the project; package-scoped parameters are only available within the context of a single SSIS package. This functionality reduces repetition in SSIS package development and execution configuration.

Global parameters allow our SSIS Framework customers to set parameters and values that apply to the entire SSIS Catalog.

Now that you know what they are, Andy has an example of them in action. Global parameters aren’t part of the community edition, but they do look interesting.

Comments closed

Migrating SSIS to Azure Data Factory

Koen Verbeeck has some articles for us:

For quite some time now, there’s been the possibility to lift-and-shift your on-premises SSIS project to Azure Data Factory. There, they run in an Integration Runtime, a cluster of virtual machines that will execute your SSIS packages. In the beginning, you only had the option to use the project deployment model and host your SSIS catalog in either an Azure SQL DB, or in a SQL Server Managed Instance.

But over time, features were added and now the package deployment model has been supported for quite some time as well. Even more, the “legacy SSIS package store” is also supported. For those who still remember this, it’s the SSIS service where you can log into with SSMS and see which packages are stored in the service (either the file system or the MSDB database) and which are currently running.

Read on for much more detail on the topic.

Comments closed

Avoid Backup-and-Restore of SSISDB for Deployment

Andy Leonard recommends not using backup-and-restore as an approach of moving SSIS packages around:

First, please do not misunderstand. You should back up SSISDB just like you back up all other databases – especially in Production. You should also conduct Disaster Recovery exercises in which you restore SSISDB from the latest backup, or avail yourself of Always On availability groups and / or Windows Server Failover Clustering.

With that caveat in mind, read on to see why.

Comments closed