ETL / ELT – Page 3 – Curated SQL

COPY and \COPY in PostgreSQL

Published 2025-03-20 by Kevin Feasel

PostgreSQL is equivalent to a Swiss Army Knife in the database world. There are things in PostgreSQL that are very simple to use, while in another database, they take many more steps to accomplish. But sometimes, the knife has too many blades, which can cause confusion. This is one of those cases.

Read on to understand what the difference is between these two commands.

Comments closed

Improving the Microsoft Fabric Copy Job

Published 2025-03-19 by Kevin Feasel

Krishnakumar Rukmangathan makes a copy:

Copy Job has been a go-to tool for simplified data ingestion in Microsoft Fabric, offering a seamless data movement experience from any source to any destination. Whether you need batch or incremental copying, it provides the flexibility to meet diverse data needs while maintaining a simple and intuitive workflow.

We continuously refine Copy Job based on customer feedback, enhancing both functionality and user experience. In this update, we’re introducing three key UX improvements designed to streamline your workflow and boost efficiency.

Read on for those three improvements.

Comments closed

Writing Data into a Microsoft Fabric Lakehouse via Notebook

Published 2025-03-12 by Kevin Feasel

Stepan Resl writes some code:

Since Lakehouse is one of the key items within Microsoft Fabric, it is important to know how to write data into it in various formats and using different tools. One of the most common tools is notebooks, as they provide great flexibility and speed for development and testing with graphical outputs. In this article, I want to focus primarily on the following types of notebooks:

PySpark

Python

Click through to see how it works in both notebook types.

Comments closed

Refreshing Power BI Semantic Model Hidden Tables via Fabric Data Pipelines

Published 2025-03-11 by Kevin Feasel

Chris Webb digs into the dark underbelly of a semantic model:

Following on from my recent post about refreshing semantic models with Fabric Data Pipelines and the semantic model refresh activity, a few people asked me how to refresh hidden tables because they are not displayed in the Pipeline configuration UI. I got the answer from my colleague Alex Powers (aka reddit celebrity u/itsnotaboutthecell) who kindly allowed me to blog about it.

Click through for the demonstration.

Comments closed

ETL Orchestration and Air Traffic Control

Published 2025-03-11 by Kevin Feasel

Jens Vestergaard extends a metaphor:

We have been working getting an enterprise grade event driven orchestration of our ETL system to operate like an airport control tower, managing a fleet of flights (data processes) as they progress through various stages of take-off, transit, and landing. All of this, because Microsoft Fabric has a core-based limit to the number of Notebook executions that a capacity can execute and have queued up in line for execution when invoking them using the REST API. Read the details here: limits (you know, it’s funny that there is no stated limits for Azure Service Bus Queues on number of messages in queue, but there is for Microsoft Fabric, which uses a Service Bus queue underneath…)

That limitation is a bit annoying, but read on to see how Jens uses this metaphor to explain the various parts of an ETL orchestration engine.

Comments closed

Non-Deterministic Functions and Data Factory Logging

Published 2025-03-04 by Kevin Feasel

Richard Swinbank runs into a problem:

TL;DR:

Data Factory implementations in Fabric, Azure Synapse Analytics or Azure Data Factory evaluate pipeline expressions separately for logging and execution.

Log information reported from activities using non-deterministic functions may be unreliable.

Richard does give us a nice tl;dr, but still read the whole thing.

Comments closed

Self-Hosted Integration Runtime Reconnecting to Cloud Service

Published 2025-03-04 by Kevin Feasel

Nivritti Suste handles an error:

In our organization, most data is stored on-premises with a limited set of less critical data is in the cloud. We use Azure to benefit from the cloud environment and Azure Data Factory (ADF) to move data.

With ADF, there are many components that need to integrate within the environment. The data on our on-premises servers needs to be shifted to the cloud periodically and we use Self-hosted Integration Runtime.

Our developers complain an ADF pipeline is failing with error: ‘The Self-hosted Integration Runtime is offline…’ What does this mean?

Click through for the answer.

Comments closed

Looping through bcp Calls in Powershell

Published 2025-03-03 by Kevin Feasel

Peter Schott needs to perform a series of bulk inserts:

Sometimes you need to extract a large number of tables into some other format. I’ve written about BCP earlier as a quick option to move data around, but what if you need a delimited text file or a way to repeat the calls without too much trouble across a set of servers? I had this come up recently and wrote up a combination of PowerShell, the sqlserver module, and the bcp.exe tool to allow for a variety of extract types and also, importantly, ensure that we get header data with those extracts in case we need them to pull in to another process. This is doable without the header piece, but many processes (and humans) expect some sort of header data. Thus – a slightly expanded process.

Click through for a script and an explanation of the process.

Comments closed

Power BI Semantic Model Monthly Refresh via Fabric Data Pipelines

Published 2025-02-25 by Kevin Feasel

Chris Webb has another way for scheduling refreshes:

I’m sure you already know how to configure scheduled refresh for your semantic models in Power BI. While the options you have for controlling when refresh takes place are generally good enough – you can configure daily or weekly refreshes and set up to eight times a day for refreshes to take place – there are some scenarios it doesn’t work for, such as monthly refreshes. Up to now the workaround has been to use Power Automate to trigger refreshes (see here for an example) or to call the refresh API from another application. Now, with Fabric, you have a much better option for scheduling refreshes: Data Pipelines.

Click through for the demonstration.

Comments closed

Dealing with Schema Drift in Azure Data Factory

Published 2025-02-18 by Kevin Feasel

Rayis Imayev deals with change:

I will jump straight to the problem statement without a “boring” introduction, which, in a sense, already feels like an opening statement.

Moving data between two or more endpoints is a common task. Sometimes it’s as simple as migrating data from one place to another. Other times, it’s a request to copy specific documents from source environments. In more complex cases, you might need to consolidate multiple data files into the same destination, such as loading several separate files into a single database table.

This was the bete noir of SSIS’s existence. Minor metadata changes would cause the entire system to break down.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Category: ETL / ELT