ETL / ELT – Page 12 – Curated SQL

Using the Azure Data Factory Self-Hosted Integration Runtime

Published 2023-07-14 by Kevin Feasel

In Azure data factory (ADF), An integration runtime is a compute resource to run your pipelines on. When you run an application on your computer, it uses the computer resources, such as CPU and memory, to run its tasks. When you run activities in a pipeline in ADF, they also need resources to do their job, like copying data or writing a file, and these are provided by the integration runtime.

When you create an instance of ADF, you get a default integration runtime, hosted in the same region that you created ADF in. If you need, you can add your own integration runtimes, either on Azure, or you can download and install a self-hosted integration runtime (SHIR) on your own server.

Read on to understand when you would want to use a self-hosted integration runtime and the process to do so. This SHIR also applies to Synapse pipelines and is one of the few ways to move data out of a Synapse workspace with data exfiltration protection enabled.

Comments closed

Fabric Data Integration

Published 2023-07-11 by Kevin Feasel

Teo Lachev reviews the primary methods for data ingestion in Microsoft Fabric:

Fabric supports three options for automated data integration: Data Pipeline (Azure Data Factory pipeline), Dataflow Gen2 (Power BI dataflow), and Notebook (Spark). I summarize these three options in the following table, which loosely resembles the Microsoft comparison table but with my take on it.

Read on for Teo’s thoughts on the matter.

Comments closed

Azure Data Factory: Failed Get Parameter Info

Published 2023-07-11 by Kevin Feasel

Andy Leonard troubleshoots an error:

Have you encountered this error: “Failed Get Parameter Info of parameter ____”? I ran into this error a couple times while consulting for a client to lift and shift SQL Server Integration Services (SSIS) to Azure Data Factory (ADF).

Click through for an example and what is the most likely cause of the problem.

Comments closed

Listing Available Properties in Azure Data Factory

Published 2023-07-03 by Kevin Feasel

Andy Leonard builds a list:

Did you know Azure Data Factory (ADF) will actually list available properties? It will. One of the things I cover in my ADF training titled Master the Fundamentals of Azure Data Factory is this handy troubleshooting tip.

Read on to see how, though I’d personally like something which is a bit faster than waiting for the thing to execute and getting back what my choices are.

Comments closed

Loading Data from On-Premises SQL Server into Microsoft Fabric

Published 2023-07-03 by Kevin Feasel

Reitse Eskens spends an hour or so:

In my previous blogs, I’ve written about Fabric and all the cool things it can do. Thing is, my load tests were based on files. Either CSV or Delta. But in reality, a lot of data comes from an on-premises database server. In reality, you might connect to a SQL 2008 instance or maybe even older. Truth be told, I haven’t got an instance in that version/edition around anymore. So I had to use SQL Server 2019, a version I’m encountering more often nowadays.

For this blog, it won’t make much sense to create a humongous database and try to get all the data in. Fabric will cope, the major issue (in my experience) is the internet connection between my local database and the Fabric environment. One thing I’m really curious about is if Fabric will have the Link capability that was introduced for Synapse Analytics and SQL Server 2022.

There’s no Link capability currently available, so Reitse does the next-best thing and uses Fabric pipelines.

Comments closed

A Complex Example of ADF Pipeline Return Value

Published 2023-06-27 by Kevin Feasel

Andy Leonard goes beyond the simple example:

In this post, I demonstrate one way to create a child pipeline that returns the SubscriptionId for a data factory. I then call the child pipeline from a parent package.

To build this demonstration, please follow the instructions that follow.

This is definitely more complicated than Andy’s simple example, but there are plenty of screenshots to take you through the process.

Comments closed

A Simple Example of ADF Pipeline Return Value

Published 2023-06-26 by Kevin Feasel

Andy Leonard starts easy:

I want to develop an Azure Data Factory (ADF) design pattern for calling focused, unit-of-work, function-y ADF pipelines that perform focused tasks. Some of these “worker pipelines” will need to return values to the calling pipeline.

In this example, I started by reading Mark Kromer‘s (excellent) article titled You can now customize the return value from your pipeline! I then crafted the simple example shown in this post to make sure I understood the principles involved before using pipeline return value (preview) functionality in more robust ADF patterns.

Follow the steps I outline below to build a simple example for an ADF pipeline that returns a value!

Click through to follow those steps.

Comments closed

Data Pipelines and Dataflows in Fabric Data Factory

Published 2023-05-29 by Kevin Feasel

Reza Rad has two videos and posts for us. First up is a primer on data pipelines in Microsoft Fabric Data Factory:

The Pipeline comes from Azure Data Factory. A Pipeline is a group of activities bundled together into a workflow. For example, a Pipeline can generate a process around the Dataflow. For example, you may want to run a Dataflow in a loop until something happens, and with the failure or success of each execution, you want to perform a task such as sending out an email, copying data somewhere, running a stored procedure, etc.

Reza then gets into Dataflows:

Through the years, the Data Transformation engines evolved. In the past, much coding was involved, and the user interface was not the best experience. These days, most actions can be done through pre-built transformations; less coding is needed, and a hardcore developer is not needed for preliminary tasks. This enables citizen data engineers to work with these tools.

Power Query is the data transformation engine of the new generation of Microsoft Data Integration tools and services. Power Query is the data transformation engine used in Power BI. However, Power Query can be used as a standalone cloud-based data transformation service when it is used as Dataflow. Dataflow is the ETL in the cloud offered by Microsoft, which uses the Power Query engine.

Comments closed

Fixing ORA-26086 in Azure Data Factory Pipelines

Published 2023-05-11 by Kevin Feasel

Emanuele Meazzo fixes a problem:

Turns out, ADF is rightfully trying to insert the rows in bulk, but Oracle doesn’t like it when the sink table has triggers, falling back to row by row insertion seems it’s too much to ask, so you end up with the error.
Searching on the good ‘ol wide web you’ll encounter this solution, that basically tells you to disable bulkload for the whole Oracle connection by setting EnableBulkLoad=0 in the connection string.
That wouldn’t work for me, because I’m bulk loading just fine everywhere else, so either I had to suffer slower performance on all the other inserts by disabling the bulk insert, or I had to create an additional and separate Linked Service to Oracle with the bulk insert disabled: doable but adding maintenance overhead when things change (two secrets to manage instead of one).

My solution? Super dumb.

Read on for a copy of the error message and Emanuele’s solution. There are two philosophies with regard to dumb solutions:

If a solution is dumb and it works, it isn’t dumb.
If a solution is dumb and it works, it’s still a dumb solution and you just got lucky this time.

I’m not quite sure under which category this falls.

Comments closed

Adding Microseconds to ADF Timestamps

Published 2023-04-27 by Kevin Feasel

Rayis Imayev can’t wait for the next second:

The current addToTime function (https://learn.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions#addToTime) in Azure Data Factory (ADF) only supports a specific set of time units ranging from Year to Seconds. Since I needed to increment a timestamp by microseconds, I had to find an alternative solution in ADF. Here are my findings on how to do this using an alternate approach.

Click through for Rayis’s solution to the problem.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Category: ETL / ELT