ETL / ELT – Page 3 – Curated SQL

Writing Data into a Microsoft Fabric Lakehouse via Notebook

Published 2025-03-12 by Kevin Feasel

Since Lakehouse is one of the key items within Microsoft Fabric, it is important to know how to write data into it in various formats and using different tools. One of the most common tools is notebooks, as they provide great flexibility and speed for development and testing with graphical outputs. In this article, I want to focus primarily on the following types of notebooks:

PySpark

Python

Click through to see how it works in both notebook types.

Comments closed

Refreshing Power BI Semantic Model Hidden Tables via Fabric Data Pipelines

Published 2025-03-11 by Kevin Feasel

Chris Webb digs into the dark underbelly of a semantic model:

Following on from my recent post about refreshing semantic models with Fabric Data Pipelines and the semantic model refresh activity, a few people asked me how to refresh hidden tables because they are not displayed in the Pipeline configuration UI. I got the answer from my colleague Alex Powers (aka reddit celebrity u/itsnotaboutthecell) who kindly allowed me to blog about it.

Click through for the demonstration.

Comments closed

ETL Orchestration and Air Traffic Control

Published 2025-03-11 by Kevin Feasel

Jens Vestergaard extends a metaphor:

We have been working getting an enterprise grade event driven orchestration of our ETL system to operate like an airport control tower, managing a fleet of flights (data processes) as they progress through various stages of take-off, transit, and landing. All of this, because Microsoft Fabric has a core-based limit to the number of Notebook executions that a capacity can execute and have queued up in line for execution when invoking them using the REST API. Read the details here: limits (you know, it’s funny that there is no stated limits for Azure Service Bus Queues on number of messages in queue, but there is for Microsoft Fabric, which uses a Service Bus queue underneath…)

That limitation is a bit annoying, but read on to see how Jens uses this metaphor to explain the various parts of an ETL orchestration engine.

Comments closed

Non-Deterministic Functions and Data Factory Logging

Published 2025-03-04 by Kevin Feasel

Richard Swinbank runs into a problem:

TL;DR:

Data Factory implementations in Fabric, Azure Synapse Analytics or Azure Data Factory evaluate pipeline expressions separately for logging and execution.

Log information reported from activities using non-deterministic functions may be unreliable.

Richard does give us a nice tl;dr, but still read the whole thing.

Comments closed

Self-Hosted Integration Runtime Reconnecting to Cloud Service

Published 2025-03-04 by Kevin Feasel

Nivritti Suste handles an error:

In our organization, most data is stored on-premises with a limited set of less critical data is in the cloud. We use Azure to benefit from the cloud environment and Azure Data Factory (ADF) to move data.

With ADF, there are many components that need to integrate within the environment. The data on our on-premises servers needs to be shifted to the cloud periodically and we use Self-hosted Integration Runtime.

Our developers complain an ADF pipeline is failing with error: ‘The Self-hosted Integration Runtime is offline…’ What does this mean?

Click through for the answer.

Comments closed

Looping through bcp Calls in Powershell

Published 2025-03-03 by Kevin Feasel

Peter Schott needs to perform a series of bulk inserts:

Sometimes you need to extract a large number of tables into some other format. I’ve written about BCP earlier as a quick option to move data around, but what if you need a delimited text file or a way to repeat the calls without too much trouble across a set of servers? I had this come up recently and wrote up a combination of PowerShell, the sqlserver module, and the bcp.exe tool to allow for a variety of extract types and also, importantly, ensure that we get header data with those extracts in case we need them to pull in to another process. This is doable without the header piece, but many processes (and humans) expect some sort of header data. Thus – a slightly expanded process.

Click through for a script and an explanation of the process.

Comments closed

Power BI Semantic Model Monthly Refresh via Fabric Data Pipelines

Published 2025-02-25 by Kevin Feasel

Chris Webb has another way for scheduling refreshes:

I’m sure you already know how to configure scheduled refresh for your semantic models in Power BI. While the options you have for controlling when refresh takes place are generally good enough – you can configure daily or weekly refreshes and set up to eight times a day for refreshes to take place – there are some scenarios it doesn’t work for, such as monthly refreshes. Up to now the workaround has been to use Power Automate to trigger refreshes (see here for an example) or to call the refresh API from another application. Now, with Fabric, you have a much better option for scheduling refreshes: Data Pipelines.

Click through for the demonstration.

Comments closed

Dealing with Schema Drift in Azure Data Factory

Published 2025-02-18 by Kevin Feasel

Rayis Imayev deals with change:

I will jump straight to the problem statement without a “boring” introduction, which, in a sense, already feels like an opening statement.

Moving data between two or more endpoints is a common task. Sometimes it’s as simple as migrating data from one place to another. Other times, it’s a request to copy specific documents from source environments. In more complex cases, you might need to consolidate multiple data files into the same destination, such as loading several separate files into a single database table.

This was the bete noir of SSIS’s existence. Minor metadata changes would cause the entire system to break down.

Comments closed

Controlling Execution Flow in Fabric Data Pipelines

Published 2025-02-06 by Kevin Feasel

Reza Rad has everything under control:

In Microsoft Fabric, the Data Factory is the workload for ETL and data integration, and the Data Pipeline is a component in that workload for orchestrating the execution flow. There are activities in the pipeline, and you can define in which order you want the activities to run. In this article and video, you will learn about the execution order and output states in Data Pipeline and how they can be used in real-world scenarios of data integration.

The mechanisms here are fundamentally similar to what we’ve had in Azure Data Factory (obviously) and SQL Server Integration Services.

Comments closed

Notes on Change Tracking for Warehouse Incremental Loads

Published 2025-01-31 by Kevin Feasel

Meagan Longoria shares some hard-earned experience:

I have a few clients that incrementally load tables from a SQL Server source into their data warehouse or lakehouse by using change tracking. Lately, they encountered some issues with changes to the configuration and the data in the source database, so I decided to share some things you can check before using change tracking as part of your ETL load or when troubleshooting your data load.

Click through for three common issues you may run into while using change tracking.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: ETL / ELT