ETL / ELT – Page 24 – Curated SQL

For the better part of 15 years, SQL Server Integration Services (SSIS) has been the go-to enterprise extract-transform-load (ETL) tool for shops running on Microsoft SQL Server. More recently, Microsoft added Azure Data Factory (ADF) to its stable of enterprise ETL tools. In this post, I’ll be comparing SSIS and Azure Data Factory to share how they are alike and how they differ. I’ll also review the strengths and shortcomings of each, including the architectures in which each of these is likely to do well.

Read on for Tim’s thoughts on the subject. Tim lays out his biases up-front but also gives you a good feel for where both products are in their lifecycles.

Comments closed

Incremental Pipline Development with Azure Data Factory

Published 2020-07-17 by Kevin Feasel

Andy Leonard shows how you can incrementally develop Azure Data Factory pipelines:

A friend pinged me recently to ask about rolling back Azure Data Factory (ADF) pipeline versions. My response was a question: Are you using source control with ADF? That did not help the current situation.
I thought of the way I often build ADF pipelines and shared my methodology, which is relatively simple (it has to be simple for me to understand it!):

Click through for Andy’s approach.

Comments closed

Trust and Warehouse Data

Published 2020-07-14 by Kevin Feasel

Rob Farley explains one way that people might lose trust in your warehouse data:

The scenario is that there’s a source system, and there’s a table in a warehouse that is being used to report on it. Maybe it’s being populated by Integration Services or Data Factory. Maybe it’s being populated by T-SQL. I don’t really care. What I care about is whether the data in the warehouse is a true representation of what’s in the source system.
If it’s not a true representation, then we have all kinds of problems.
Mostly, that our warehouse is rubbish.

Read on for an example of how this might occur and what you can do to prevent it.

Comments closed

Refreshing Power BI Datasets in Azure Data Factory

Published 2020-07-13 by Kevin Feasel

Meagan Longoria shows us how to refresh a Power BI dataset using Azure Data Factory:

I recently needed to ensure that a Power BI imported dataset would be refreshed after populating data in my data mart. I was already using Azure Data Factory to populate the data mart, so the most efficient thing to do was to call a pipeline at the end of my data load process to refresh the Power BI dataset.
Power BI offers REST APIs to programmatically refresh your data. For Data Factory to use them, you need to register an app (service principal) in AAD and give it the appropriate permissions in Power BI and to an Azure key vault.

Click through for the solution.

Comments closed

Migrating From Cosmos DB to SQL Server

Published 2020-07-10 by Kevin Feasel

Eitan Blumin builds an app:

The general idea is this:
The app executes a Cosmos DB query and collects a number of records into its “buffer”.
Once that “buffer” reaches a certain number of records (configurable), it’s time to “flush” it into the SQL Server. That could be either a database table receiving a Bulk Copy stream, or a stored procedure receiving a table valued parameter (again, configurable).
After the buffer is flushed, we have the option to execute a “merge” procedure. This is a stored procedure that would implement an “upsert” logic from the “staging” table and into the actual destination table.

Read on for more explanation and check out Eitan’s GitHub repo.

Comments closed

Deploying ADF via Azure DevOps

Published 2020-07-09 by Kevin Feasel

Kamil Nowinski has part two on a series about releasing Azure Data Factory code:

Struggling with #ADF deployment? adf_publish branch doesn’t suit your purposes? Don’t have skills with PowerShell? I have good news for you. There is a new tool in the market. It’s a task for Azure DevOps Release Pipeline to deploy whole ADF from code (JSON files) to ADF instance in Azure. Behind the scenes, it runs the PowerShell module which does all job for you.
Sounds unbelievable? But it’s real! Check it out for yourself.

Click through for the video.

Comments closed

ADF.Procfwk Version 1.8

Published 2020-07-02 by Kevin Feasel

Paul Andrew has been busy:

Following more great feedback from the Data Platform community the primary goal of this release was to further improve the resilience of the framework processing. These improvements included its restart clean up capabilities and introducing better dependency chain handling between Worker pipelines when failures occur. The latter builds on the existing restart functionality first introduced in release v1.2 and supplements the logic using a new set of pipeline dependency metadata. I’ve created the below visual to conceptually show the new dependency chain behaviour, should you wish to populate and make use of the new metadata handling.

Read on for the full changelog.

Comments closed

Publishing Azure Data Factory via Azure DevOps

Published 2020-06-26 by Kevin Feasel

Kamil Nowinski shares how to deploy Azure Data Factory flows via Azure DevOps:

Struggling with #ADF deployment? adf_publish branch doesn’t suit your purposes? Don’t have skills with PowerShell? I have good news for you. There is a new tool in the market. It’s a task for Azure DevOps Release Pipeline to deploy whole ADF from code (JSON files) to ADF instance in Azure. Behind the scenes, it runs the PowerShell module which does all job for you.
Sounds unbelievable? But it’s real! Check it out for yourself.

Click through for a video.

Comments closed

Custom Parameters in Azure Data Factory Deployments

Published 2020-06-23 by Kevin Feasel

Rayis Imayev shows us how to use customer parameters in ARM templates when deploying Azure Data Factory pipelines:

If I needed to visually explain how this custom parameterization works for Azure Data Factory resource, I would picture it this way. Before you solely relied on publishing your ADF code from your collaboration Git branch to the adf_publish branch where ARMTemplateForFactory.jsonand ARMTemplateParametersForFactory.json files live and get further deployed to other environments. You had some flexibility to parameterize your deployment or run some custom code to update ARM templates before they get deployed.
With the introduction of the ADF custom parameterization, you have an additional JSON file arm-template-parameters-definition.json that you can use to define rules to add supplementary parameters to the main ARMTemplateParametersForFactory.json file. There is a very important statement on Microsoft documentation site that explains how this new file operates, “A definition can’t be specific to a resource instance. Any definition applies to all resources of that type”. It’s like a garden rake that will collect all the leaves or none, i.e. if your rule defines a JSON property, let’s say “timeout” of your ForEach loop container, then all timeouts will be scooped into ARM template parameter file.

Read on for the full explanation as well as an example.

Comments closed

Calculating Test Coverage of Azure Data Factory Pipelines

Published 2020-06-22 by Kevin Feasel

Richard Swinbank wraps up a series on testing in Azure Data Factory:

To determine which activities have been executed by a test suite, I need to collect and aggregate activity run data from every pipeline execution triggered from any test fixture. In the previous post I developed components to retrieve and cache activities for a pipeline run – I’ll use those components here to collect data systematically.
I’m going to create a new helper class to contain functions specific to coverage measurement. It’s a subclass of the database helper because I want to exploit functionality from classes further up the hierarchy:

Read on for the code and process for measurement.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: ETL / ELT

Comparing Integration Services and Azure Data Factory

Incremental Pipline Development with Azure Data Factory

Trust and Warehouse Data

Refreshing Power BI Datasets in Azure Data Factory

Migrating From Cosmos DB to SQL Server

Deploying ADF via Azure DevOps

ADF.Procfwk Version 1.8

Publishing Azure Data Factory via Azure DevOps

Custom Parameters in Azure Data Factory Deployments

Calculating Test Coverage of Azure Data Factory Pipelines