Press "Enter" to skip to content

Category: ETL / ELT

Calling Logic Apps via Data Factory Pipelines

Andy Brownsword flips the script:

Last week we looked at calling a Data Factory Pipeline from a Logic App. This week I thought we’d balance it out by taking a look at calling a Logic App from an Azure Data Factory (ADF) Pipeline.

When building the Logic App last week we had to create our own polling mechanism to check for completion of the pipeline. The process is much simpler in the opposite direction. I specifically want to highlight two approaches, and save some pennies whilst we’re at it.

I am all about saving pennies, so be sure to check out that section as well.

Leave a Comment

Executing Data Factory Pipelines from Logic Apps

Andy Brownsword automates a workflow:

When building Azure Logic Apps we can use the Azure Data Factory connector to start a pipeline. However that action simply triggers a pipeline and doesn’t wait for it to finish. If your downstream logic depends on the output – for example to collect a file – this can cause issues.

In this post I’ll demonstrate how to control the Logic App flow to wait for the pipeline to complete before proceeding.

Read on to see how, as well as some additional ideas of how to improve the pattern.

Leave a Comment

Copying Data across Tenants with Fabric Data Factory

Ye Xu makes use of the Copy job:

Copy job is the go-to solution in Microsoft Fabric Data Factory for simplified data movement, whether you’re moving data across clouds, from on-premises systems, or between services. With native support for multiple delivery styles, including bulk copy, incremental copy, and change data capture (CDC) replication, Copy job offers the flexibility to handle a wide range of data movement scenarios—all through an intuitive, easy-to-use experience. Learn more in What is Copy job in Data Factory – Microsoft Fabric | Microsoft Learn.

With Copy job, you can also perform cross-tenant data movement between Fabric and other clouds, such as Azure. It also enables cross-tenant data sharing within OneLake, allowing you to copy data across Fabric Lakehouse, Warehouse, and SQL DB in Fabric between tenants with SPN support. This blog provides step-by-step guidance on using Copy job to copy data across different tenants.

Click through for a demonstration, as well as the security permissions that are necessary for this to work.

Leave a Comment

The Downside of Zero-Copy Integration between Kafka and Iceberg

Jack Vanlightly lays out an argument:

Over the past few months, I’ve seen a growing number of posts on social media promoting the idea of a “zero-copy” integration between Apache Kafka and Apache Iceberg. The idea is that Kafka topics could live directly as Iceberg tables. On the surface it sounds efficient: one copy of the data, unified access for both streaming and analytics. But from a systems point of view, I think this is the wrong direction for the Apache Kafka project. In this post, I’ll explain why. 

Read on for an explanation of what “zero-copy” means here, as well as Jack’s position on the matter. I think it’s a solid argument and worth the read.

Comments closed

Updates to Microsoft Fabric Dataflows Gen2

Nikola Ilic digs into some announcements:

In the ocean of announcements from the recent FabCon Europe in Vienna, one that may have gone under the radar was about the enhancements in performance and cost optimization for Dataflows Gen2.

Before we delve into explaining how these enhancements impact your current Dataflows setup, let’s take a step back and provide a brief overview of Dataflows. For those of you who are new to Microsoft Fabric – a Dataflow Gen2 is the no-code/low-code Fabric item used to extract, transform, and load the data (ETL).

It sounds like these changes move Dataflows Gen2 from the “Never choose this” option to something that has become viable in at least some circumstances.

Comments closed

Migrating from Apache Airflow 2 to 3 on Amazon MWAA

Anurag Srivastava, et al, perform a migration:

Apache Airflow 3.x on Amazon MWAA introduces architectural improvements such as API-based task execution that provides enhanced security and isolation. Other major updates include a redesigned UI for better user experience, scheduler-based backfills for improved performance, and support for Python 3.12. Unlike in-place minor Airflow version upgrades in Amazon MWAA, upgrading to Airflow 3 from Airflow 2 requires careful planning and execution through a migration approach due to fundamental breaking changes.

This migration presents an opportunity to embrace next-generation workflow orchestration capabilities while providing business continuity. However, it’s more than a simple upgrade. Organizations migrating to Airflow 3.x on Amazon MWAA must understand key breaking changes, including the removal of direct metadata database access from workers, deprecation of SubDAGs, changes to default scheduling behavior, and library dependency updates. This post provides best practices and a streamlined approach to successfully navigate this critical migration, providing minimal disruption to your mission-critical data pipelines while maximizing the enhanced capabilities of Airflow 3.

Read on to see what has changed between these two major versions of Airflow, recommendations on what to look out for, and a step-by-step migration guide.

Comments closed

Updates to Fabric Data Factory

Abhishek Narain has a list of updates:

Workspace Private Link Support for Data Factory (Preview): Microsoft Fabric enables secure data integration through Private Link support in Dataflows Gen2, Pipelines, and Copy jobs. This ensures that inbound data access remains isolated and compliant within protected workspaces. By leveraging VNet data gateways, organizations can securely connect to data sources across Private Link-enabled environments—eliminating exposure to public networks and reinforcing enterprise-grade security for sensitive data operations.

Most of these are security-related updates, with a mixture of things now GA, things currently in preview, and a pair of items coming soon.

Comments closed

Microsoft Fabric Copy Job Updates

Ye Xu has an update:

Copy job is the go-to solution in Microsoft Fabric Data Factory for simplified data movement. With native support for multiple delivery styles, including bulk copy, incremental copy, and change data capture (CDC) replication, Copy job offers the flexibility to handle a wide range of scenarios—all through an intuitive, easy-to-use experience.

This update introduces several enhancements, including connection parameterization, expanded CDC capabilities, new connectors, and a streamlined Copy Assistant powered by Copy job.

Read on to see what’s new. Some of the items in this list are preview features, and it looks like others are currently GA.

Comments closed

Troubleshooting Bulk Insertion in SQL Server

Rick Dobson lays out some common issues:

Most SQL bulk insert and SQL Server openrowset tutorials skip file access issues that can stop imports cold. Both the bulk insert statement and openrowset function rely on the SQL Server service account to read a source file. The SQL Server service account must have read permission on the file or its folder. It is also convenient to have read & execute as well as list folder content permissions. Also, non-standard source file locations (e.g., C:\Users\Public\Downloads) may not grant default read access to the SQL Server service account – always verify before use.

Click through for several recommendations, links to additional resources, and a few scripts along the way.

Comments closed

Comparing Microsoft Fabric Consumption for Notebooks and Warehouse SQL Queries

Gilbert Quevauvilliers performs a comparison:

I saw that there was an update where it is now possible to use the Microsoft Fabric Warehouse to copy data directly from OneLake into the Warehouse.

This got me thinking, which would consume more capacity to get the data into the Warehouse table. As well as which one would be faster.

To do this I am going to be running a SQL query in the Warehouse.

Next, I will use a Notebook to copy the data from the OneLake files section to a Warehouse table.

Gilbert’s specific query involves loading data from a variety of CSV files into a lakehouse via notebook, and then into a warehouse table. Read on for the results.

Comments closed