Press "Enter" to skip to content

Category: ETL / ELT

Orchestration Options in Microsoft Fabric

Reitse Eskens moves some data:

Well, unless you enjoy waking up every night to start your Extract-Transform-Load (ETL) process and manually running each process to do some work, it’s a smart move to automate this. Also, make sure everything always runs in the correct order. Additionally, there are situations where processes need to run in different configurations.

All these things can be done with what we call orchestration. It may sound a bit vague now, but we’ll get to the different moving parts of this, like parameterisation and pipelines.

Read on for a primer on the topic.

Comments closed

Tips for the Import Data Option in SQL Server

Andy Brownsword doesn’t trust wizards, with their pointy caps and long beards:

If you need to create a copy of a table in another database, the ‘Import Data’ option may seem convenient. If you’ve used this method to copy to your dev environment and found things break, this post is for you.

Click through for some solid advice on how to import that data. Another thing I would sometimes do is coerce all of the input columns to long strings and load it into a staging table. Then, I could use T-SQL to re-shape the data however I needed it rather than trying to get a finicky SSIS flow to translate this date and time combination (or whatever) appropriately.

Comments closed

Copy Job in Fabric Data Factory Pipelines now GA

Jianlei Shen makes an announcement:

Copy Job Activity allows you to run Copy jobs as native activities inside Data Factory pipelines.

Copy jobs are created and managed independently in Data Factory for quick data movement between supported sources and destinations. With Copy job Activity, that same fast, lightweight experience is now embedded within pipelines, making it easier to automate, schedule, and chain Copy jobs as part of broader data workflows.

Read on for an overview of what’s in the activity and a few links on how to get started with it.

Comments closed

Cutting Costs of Azure Self-Hosted Integration Runtimes

Andy Brownsword saves some quid:

If you have a Self-Hosted Integration Runtime (SHIR, or IR for short here) on an Azure Virtual Machine (VM), there’s a cost to keep it online. When used intermittently – for example during batch processes – this is inefficient for costs as you’re paying for the compute you don’t need. One way to alleviate this is by controlling uptime of the environment manually, only bringing it online for as long as needed.

Read on to see how to do this.

Comments closed

Calling Logic Apps via Data Factory Pipelines

Andy Brownsword flips the script:

Last week we looked at calling a Data Factory Pipeline from a Logic App. This week I thought we’d balance it out by taking a look at calling a Logic App from an Azure Data Factory (ADF) Pipeline.

When building the Logic App last week we had to create our own polling mechanism to check for completion of the pipeline. The process is much simpler in the opposite direction. I specifically want to highlight two approaches, and save some pennies whilst we’re at it.

I am all about saving pennies, so be sure to check out that section as well.

Comments closed

Executing Data Factory Pipelines from Logic Apps

Andy Brownsword automates a workflow:

When building Azure Logic Apps we can use the Azure Data Factory connector to start a pipeline. However that action simply triggers a pipeline and doesn’t wait for it to finish. If your downstream logic depends on the output – for example to collect a file – this can cause issues.

In this post I’ll demonstrate how to control the Logic App flow to wait for the pipeline to complete before proceeding.

Read on to see how, as well as some additional ideas of how to improve the pattern.

Comments closed

Copying Data across Tenants with Fabric Data Factory

Ye Xu makes use of the Copy job:

Copy job is the go-to solution in Microsoft Fabric Data Factory for simplified data movement, whether you’re moving data across clouds, from on-premises systems, or between services. With native support for multiple delivery styles, including bulk copy, incremental copy, and change data capture (CDC) replication, Copy job offers the flexibility to handle a wide range of data movement scenarios—all through an intuitive, easy-to-use experience. Learn more in What is Copy job in Data Factory – Microsoft Fabric | Microsoft Learn.

With Copy job, you can also perform cross-tenant data movement between Fabric and other clouds, such as Azure. It also enables cross-tenant data sharing within OneLake, allowing you to copy data across Fabric Lakehouse, Warehouse, and SQL DB in Fabric between tenants with SPN support. This blog provides step-by-step guidance on using Copy job to copy data across different tenants.

Click through for a demonstration, as well as the security permissions that are necessary for this to work.

Comments closed

The Downside of Zero-Copy Integration between Kafka and Iceberg

Jack Vanlightly lays out an argument:

Over the past few months, I’ve seen a growing number of posts on social media promoting the idea of a “zero-copy” integration between Apache Kafka and Apache Iceberg. The idea is that Kafka topics could live directly as Iceberg tables. On the surface it sounds efficient: one copy of the data, unified access for both streaming and analytics. But from a systems point of view, I think this is the wrong direction for the Apache Kafka project. In this post, I’ll explain why. 

Read on for an explanation of what “zero-copy” means here, as well as Jack’s position on the matter. I think it’s a solid argument and worth the read.

Comments closed

Updates to Microsoft Fabric Dataflows Gen2

Nikola Ilic digs into some announcements:

In the ocean of announcements from the recent FabCon Europe in Vienna, one that may have gone under the radar was about the enhancements in performance and cost optimization for Dataflows Gen2.

Before we delve into explaining how these enhancements impact your current Dataflows setup, let’s take a step back and provide a brief overview of Dataflows. For those of you who are new to Microsoft Fabric – a Dataflow Gen2 is the no-code/low-code Fabric item used to extract, transform, and load the data (ETL).

It sounds like these changes move Dataflows Gen2 from the “Never choose this” option to something that has become viable in at least some circumstances.

Comments closed