Press "Enter" to skip to content

Category: ETL / ELT

Replicating SQL’s IN Operator with Azure Data Factory

Rayis Imayev shows how we can find values in a group using Azure Data Factory:

However only this use-case for the OR function with 2 condition could be possible:or(equals(variables(‘var1’), ‘A’), equals(variables(‘var1’), ‘B’)) – limit of two conditions

But what if we have an ability to check if a particular element variable/parameter/other ADF object value belongs to a range of values (array of value), similarly to what we can do with the IN operator in SQL language, this would definitely solve our problem and remove the limitation of logical conditions to check.

Click through for the answer.

Comments closed

Scheduling SSIS Packages in Azure

Magi Naumova takes us through the process of running SSIS in Azure Data Factory, including the scheduling of jobs to run our SSIS packages:

The main purpose of these tools is to force the Lift and Shift approach of migrating and running existing SSIS Packages in Azure. I wouldn’t say that this is the most effective approach of transferring the ETL to Azure, but it could be a good start on a road of a Modern Azure Datawarehouse Architecture. If you have already deployed SSIS packages in Azure SSIS Catalog, then SSMS 18 helps you to put them on schedule very quickly.

Running SSIS Packages in Azure requires provisioning of SSIS Runtime Engine, an Azure Data Factory instance and a SQL Database which hosts the SSIS catalog. Scheduling SSIS Packages in Azure requires creating a data flow pipeline in ADF which has a trigger defined for scheduled execution. While describing all those concepts is far above the scope of this chapter, a short description would be useful.

Read on for a good amount of detail and a demo which walks through the process.

Comments closed

Unioning Datasets in Azure Data Factory

Rayis Imayev takes us through the Union Transformation in Mapping Data Flows:

Recently I had a chance to work on a special case of creating a file where the first two rows were static and contained output table metadata (columns names and column types), assuming that further reading process would be able to explicitly identify each column data type and consume new file correctly in a single run.

Read on to see how Rayis solved this with Azure Data Factory.

Comments closed

Using Azure Functions Inside Azure Data Factory

Rayis Imayev shows how you can call an Azure Function from inside your Azure Data Factory Pipeline:

Creating a data solution with Azure Data Factory (ADF) may look like a straightforward process: you have incoming datasets, business rules of how to connect and change them and a final destination environment to save this transformed data. Very often your data transformation may require more complex business logic that can only be developed externally (scripts, functions, web-services, databricks notebooks, etc.).

In this blog post, I will try to share my experience of using Azure Functions in my Data Factory workflows: my highs and lows of using them, my victories and struggles to make them work.

This includes a description of the options, a demo function, and additional notes for each technique.

Comments closed

Metadata Integrity Checks in ADF.ProcFwk

Paul Andrew has another update to the ADF metadata-driven processing framework:

With this release of the framework I wanted to take the opportunity to harden the database and add some more integrity (intelligence) to the metadata, things that go beyond the existing database PK/FK constraints. After all, this metadata drives everything that Azure Data Factory does/is about to do – so it needs to be correct. These new integrity checks take two main forms:

1. Establishing a minimum set of criteria within the metadata before the core Data Factory processing starts and creates an execution run.
2. Establishing a logical chain of pipeline dependencies across processing stages. Then providing a set of advisory checks for area’s of conflict and/or improvement.

More details on both are included against the actual stored procedure in the database changes section below.

In addition to database hardening, I’ve added a few other bits to the solution, including a PowerShell script for ADF deployments and a Data Studio Notebook to make the developer experience of implementing this code project a little nicer.

Read on to see what’s in version 1.3. Check it out on GitHub as well.

Comments closed

Checking JSON Structure with ADF

Rayis Imayev takes us through the solution of a tricky problem in Azure Data Factory:

Within my “ForEach” container I have also placed a Stored Procedure task and set 4 data elements from my incoming data stream as values for corresponding parameters.

However this approach will not work for all my incoming JSON events, it actually failed for the last one, since it didn’t have both “stop_time” and “last_update” data elements.

An easy way to fix this problem is to add missing data elements with empty values for the last event record, however, when we don’t have control over incoming data, we need to adjust our data processing steps.

Read on to see how Rayis solves this problem.

Comments closed

Tying Azure Data Factory to Source Control

Eddy Djaja explains why you really want to tie Azure Data Factory to your source control:

Azure Data Factory (ADF) is Microsoft’s ETL or more precise: ELT tool in the cloud. For more information of ADF, Microsoft puts the introduction of ADF in this link: https://docs.microsoft.com/en-us/azure/data-factory/introduction. As some have argued if ADF will replace or complement the “on-premise”  SSIS, it is uncertain and only time can tell what will happen in the future.
Unlike SSIS, the authoring of ADF does not use Visual Studio. ADF authoring uses a web browser to create ADF components, such as pipelines, activities, datasets, etc. The simplicity of authoring ADF may confuse the novice developers on how ADF components are saved, stored and published. When logging to ADF for the first time after creating an ADF, the authoring is in the ADF mode. How do we know?

Click through for the explanation and some resources on how to do it.

Comments closed

Auto-Detecting Column Delimiters with Data Factory

Mark Kromer shows us a way of dynamically learning what the likely delimiter of a delimited file is:

Processing delimited text files in the data lake is one of the most popular uses of Azure Data Factory (ADF). To define the field delimiter, you set the column delimiter property in an ADF dataset.

The reality of data processing is that delimiter can change often. ADF provides a facility to account for this data drift via parameterization. However, this assumes that you know that the delimiter is changing and what it will change to.

I’m going to briefly describe a sample of how to auto-detect a file delimiter using ADF Data Flows.

Click through for the demo.

Comments closed