Press "Enter" to skip to content

Category: ETL / ELT

Lessons Learned from Azure Data Factory Integrating with DB/2 on Mainframe

Teo Lachev shares some thoughts:

I’ve done a few BI integration projects extracting data from ERPs running on IBM Db2. Most of the implementations would use a hybrid architecture where the ERP would be running on an on-prem mainframe while the data was loaded in Microsoft Azure. Here are a few tips if you’re facing this challenge:

Click through for five major points. Surprisingly, one of them isn’t “Avoid DB/2 like the plague.”

Comments closed

Diving into the Microsoft Fabric Copy Activity

Reza Rad does more than copies:

Copy Activity is one of the most commonly used activities in Microsoft Fabric’s Data Factory Pipeline. The Copy Activity copies the data from a source to a destination. However, there is more to that rather than just a simple copy. In this article, you will learn what Copy Activity is, its rationale, how it works, and its configuration options.

Reza has a video, as well as a demo-heavy full-length article on the topic.

Comments closed

Batch File Importation in SQL Server

Paul White loads things quickly:

All this can be achieved with client-side tools and programming. It can also be done server-side by importing the raw data into a staging table before processing using T-SQL procedures.

Other times, the need arises to ingest data without using client-side tools and without making a complete copy of the raw data on the server. This article describes one possible approach in that situation.

Read on for the process.

Comments closed

Using the Microsoft Fabric Data Gateway

Reitse Eskens uploads some data:

In a blog from a few weeks ago, I wrote about getting data from your on-prem SQL Server into Fabric. At the time, the only option for a copy dataflow was using a direct connection over the internet. It still is, but now you can also use the PowerBI Data Gateway to get data from your SQL Server into Fabric.

In this blog, I’ll take you through the steps needed and an issue I ran into.

Read on for Reitse’s instructions and how to avoid the issue he ran into.

Comments closed

Migrating Cosmos DB Tables API

Eitan Blumin handles a migration:

A few months ago, I was involved in an interesting project where a large customer (not to be named due to NDA) needed to migrate their entire Azure cloud subscription to another subscription. This was a difficult and arduous process that involved several PaaS technologies, besides SQL Server, that I didn’t have experience with before.

But it presented very interesting challenges and opportunities to learn new things.

One of these was the need to migrate an entire Azure Cosmos DB with Table Storage API account from one subscription to another.

Read on for the challenge, the intermediate solution using the Cosmos DB Data Migration Tool, and Eitan’s Powershell script to automate the process. I know and work with most of the people working on the DMT and they’re good folks.

Comments closed

Updates to Change Data Capture in ADF

Chen Hirsh looks at some updates:

A few months ago I wrote a post about the new feature of change data capture (CDC) on Azure data factory (ADF) – https://www.madeiradata.com/post/the-wind-of-change-change-data-capture-in-data-factory

Change data capture, as the name suggests, gets the data changes on one system, and replicates them to another. Since this is a task that data engineers do a lot, this was a very welcome addition to ADF.

In this post, we’ll explore what is new on this front.

Click through for what’s new, though do be cognizant of which items are in GA and which are still in preview.

Comments closed

Using the Azure Data Factory Self-Hosted Integration Runtime

Chen Hirsh hosts a runtime:

In Azure data factory (ADF), An integration runtime is a compute resource to run your pipelines on. When you run an application on your computer, it uses the computer resources, such as CPU and memory, to run its tasks. When you run activities in a pipeline in ADF, they also need resources to do their job, like copying data or writing a file, and these are provided by the integration runtime.

When you create an instance of ADF, you get a default integration runtime, hosted in the same region that you created ADF in. If you need, you can add your own integration runtimes, either on Azure, or you can download and install a self-hosted integration runtime (SHIR) on your own server.

Read on to understand when you would want to use a self-hosted integration runtime and the process to do so. This SHIR also applies to Synapse pipelines and is one of the few ways to move data out of a Synapse workspace with data exfiltration protection enabled.

Comments closed

Fabric Data Integration

Teo Lachev reviews the primary methods for data ingestion in Microsoft Fabric:

Fabric supports three options for automated data integration: Data Pipeline (Azure Data Factory pipeline), Dataflow Gen2 (Power BI dataflow), and Notebook (Spark). I summarize these three options in the following table, which loosely resembles the Microsoft comparison table but with my take on it.

Read on for Teo’s thoughts on the matter.

Comments closed