Databricks versus Mapping Data Flows

Helge Rege Gardsvoll contrasts Azure Databricks, Azure Data Factory Mapping Data Flows, and SQL Server Integration Services:

Mapping Data Flows
One of the many data flows from Microsoft these days providing, for the first time, data transformation capabilities within Data Factory. This is not a U-SQL script or Databricks notebook that is orchestrated from Data Factory, but a tool integrated. This means that you can reuse (many of) the datasets you have defined in Data Factory, while in Databricks you don’t.

Mapping Data Flows runs on top of Databricks, but the cluster is handled for you and you don’t have to write any of that Scala code yourself.

Read on for the full comparison.

SSIS Design Preferences

Meagan Longoria systematizes a set of preferences regarding Integration Services package and ETL process design:

– Every table should have InsertDateTime and UpdateDateTime columns. The UpdateDateTime column should be populated with the same value as the InsertDateTime column upon creation of the row, rather than being left null.
– Whatever you use to create tables, include primary keys, foreign keys, and indexes with your table definitions. Provide explicit constraint names to simplify database comparisons. You can disable your foreign keys, but they need to be there to provide that metadata.
– Separate your final dimensional/reporting tables from audit tables and staging tables. This can be done with separate schemas or even separate databases.

People have added some more thoughts in the comments as well.

Maintaining SSISDB

John McCormack was in a jam:

I made 2 unsuccessful attempts at running the SSIS Server Maintenance Job. However, after several hours of processing and still no available free space in the database, I knew the job wasn’t coping with the sheer number of rows it had to delete. The deletes all happen from the parent table (internal.operations) and then all child tables using using cascading deletes. This approach maintains referential integrity but is not great for performance.

Due to this, I needed a new approach to the maintenance of SSISDB. As we hadn’t maintained these tables for 13/14 months, I was asking too much of SQL Server to let me delete everything at once. 

Read on for the solution.

Drawing SSIS Packages

Bartosz Ratajczyk continues a quest to draw SSIS packages as SVGs:

To get the Value and Expression properties I need to find the precedence constraint in the .dtsx file during the XSL transformations. It requires three changes in the package2svg.xsl:

– I have to pass the name of the .dtsx file
– I have to read the XML from the .dtsx file
– I have to use the DTS namespace because it’s the namespace of the .dtsx file

Read on for more. Bartosz to this point has covered the control flow.

Drawing SSIS Packages as SVGs

Bartosz Ratajczyk continues a series on taking SSIS packages and generating SVGs from their control flows:

To make things harder, the layout of the sequences and tasks is not some nested XML structure. All of the elements have the same parent – <GraphLayout>, meaning all of them are at the same tree level. Also – there is no attribute showing where a particular object belongs. Almost. In the example with the sequences, I see two regularities:
– the outer container is placed later in the XML, than the inner container
– the @Id attributes show the nesting of the objects

I’m not sure how often I’d use this in practice, but if you want to understand some of the internals of SSIS, this is an interesting series to follow.

Executing Azure SSIS Packages from Blob Storage

Andy Leonard cranks it to the next level:

I confess: I have been waiting for this feature since I first learned of Azure-SSIS.

When I first saw Azure-SSIS – which creates an Azure Data Factory Integration Runtime and SSIS Catalog in the cloud, my first thought was a paraphrase Ferris Bueller’s question about dying the river green: “If we can execute SSIS packages from the SSIS Catalog in Azure Data Factory, why can’t we execute SSIS packages from Azure Blob Storage?” Today, we can.

Read on to see how you can do it.

Converting Existing SSIS Packages to Biml

David Stein shows off a conversion tool built into BimlExpress:

BimlExpress is a free Visual Studio add-in created by the good folks at Varigence. Its a full featured Biml editor which allows you to dynamically create SSIS packages. It was first released back in 2017, and the latest version is 2019 (of course). The current version supports Visual Studio 2010 through 2019 as well as SQL Server 2005 through 2019.

Prior to it’s release, Biml was written with Bids Helper, now known as BI Developer Extensions. While BI Developer Extensions has many nice features, you should no longer use it to work with Biml as it is no longer being updated/supported.

I’m pleasantly surprised by this. It used to be limited to BimlStudio (nee Mist) and BimlOnline.

Extracting ISPAC Files From the SSIS Catalog

Aaron Nelson shows how we can use Powershell to export ISPAC files from the SSIS catalog:

You’ll notice with the code below, that we’re able to extract that Project-object into a variable named $Proj, but when we pipe that to the Get-Member cmdlet we do not see a method called Export.

Read on to learn what the name of this mysterious method actually is and how you can make use of it.

SSIS 2019 Preview Released

Koen Verbeeck notes something very nice:

It is not a joke: SSIS is available for Visual Studio 2019 as a preview. Whoa, hold on. SQL Server 2019 hasn’t been released yet? But there’s already an SSIS 2019? Didn’t we have to wait months after the release of SQL Server 2017 before we had an SSIS version for Visual Studio 2017?

Yes, we did, you can read all about there here.

But times have changed apparently. The SSIS team caught up with the rest of the BI tools: SSIS projects are now available from the Visual Studio market place.

Read on to see what this means for SQL Server Data Tools.

Deploying SSIS Packages with Powershell

Aaron Nelson shows us how we can deploy an Integration Services ISPAC into the SSIS catalog with Powershell:

In my last post, I showed how you can use the SSIS PowerShell Provider to execute an SSIS package with PowerShell.  Of course, in order to execute that SSIS package, it has to get deployed first.  In Part 5 of Andy Leonard’s “SSIS, Docker, and Windows Containers” series he used some PowerShell code from Matt Masson’s blog post to deploy an .ISPAC file to the SSIS catalog.

Click through for the code.

Categories

September 2019
MTWTFSS
« Aug  
 1
2345678
9101112131415
16171819202122
23242526272829
30