Press "Enter" to skip to content

Category: ETL / ELT

Partitioned Compute with Fabric Dataflows Gen2

Chris Webb continues to test out Fabric Dataflow Gen2 performance:

In the first part of this series I showed how the Concurrency setting in a Fabric Dataflows Gen2 can affect refresh performance when there are multiple queries inside the dataflow. In this post I will show how, with Partitioned Compute, this setting can also affect the performance of a single query within a dataflow.

To test this I created a dataflow with one query, a modified version of the query that I used in this post from earlier this year which returns a table with ten rows and calls a function with a built-in delay of 60 seconds on each row.

This is a preview feature but Chris shows a simple but effective test to demonstrate how this capability works.

Leave a Comment

Foreign Tables and Materialized Views in PostgreSQL

Richard Yen provides a write-back system for analysts:

I recently wrote a post about WAL log shipping and how a standby built on log shipping is a great way to give data analysts production data without putting the primary at risk. Having access to the production data in this way is great, but it’s read-only. How can we create views of this data for better analytics work? I want to make the case today that Foreign Data Wrappers and Materialized Views can make a great solution – not only in accessing production Postgres data, but also working with other data sources.

Click through for an architectural discussion of how they can work together.

Leave a Comment

An FAQ-Based Introduction to Data Factories in Azure

Koen Verbeeck answers some questions:

Is Microsoft Fabric replacing Azure Data Factory?

Officially, no. Or maybe not yet. At the time of writing, ADF still remains a separate product but it’s noticeable that more new features are added to Fabric than to ADF. There are still many customers using ADF, so Microsoft might keep the service around for a while. There’s also still a bit of a feature gap between the two services, but this becomes more narrower each month. Microsoft is offering migration scenarios from ADF to Fabric.

I picked this question because of how much the concept annoys me. There are three separate Data Factory code bases in Azure with overlapping but not matching functionality (which is how you can tell it’s multiple code bases and not just one code base reskinned). This can lead to a scenario where Person A says, “Oh, do this thing in Data Factory.” Person B then says, “But I can’t do that in Data Factory.” Person A’s response: “Oh, that’s weird, because I can do it in Data Factory.” This leads to necessary but somewhat absurd clarifications around how you need to use Microsoft Fabric Data Factory, not Azure Data Factory because, even though Microsoft Fabric Data Factory is hosted in Azure, it’s a different product.

And don’t get me started on the wide variety of KQL platforms, all of which are subtly different.

Leave a Comment

T-SQL Tuesday 198 Round-Up

Meagan Longoria wraps up another T-SQL Tuesday:

Thank you to everyone who participated in T-SQL Tuesday #198! When I wrote the invitation post, I intentionally kept the prompt broad because change detection looks different depending on your source system, your infrastructure, your data volumes, and what you need to do with the changes once you have them. The responses covered SQL Server internals, Microsoft Fabric and Synapse, hashing strategies, metadata-driven frameworks, and Synapse workspace diffing with Python. Here’s a summary of each contribution.

Read on for links to eight responses.

Leave a Comment

Performing ELT with Python and DuckDB

Jamal Hansen shows off a capable in-memory analytic database:

This is a real-world example of a common data engineering pattern. You may have heard of ETL (Extract, Transform, Load), where data is transformed before it reaches its destination. What we are actually building today is the more modern variant, ELT: Extract, Load, Transform.

Read on for the process. I like DuckDB a lot and this is one of the types of use cases in which it excels.

Leave a Comment

Metadata-Driven Frameworks for Change Detection in Microsoft Fabric

Kevin Chant builds a table:

I had various options for this months contribution due to my experience with various change detection solutions. Including Azure Synapse Link for SQL Server 2022. Which I covered in previous posts. Including one that covered some excessive file tests for Azure Synapse Link for SQL Server 2022.

In the end I decided to cover developing metadata-driven frameworks for Microsoft Fabric. Due to the fact that it is such a hot topic for multiple reasons. One of which is the growing availability of open-source, metadata-driven frameworks for Microsoft Fabric.

Read on for three such frameworks and some advice on how to use them.

Leave a Comment

Using Fabric Data Wrangler for Testing

Kristina Mishra checks out some data:

Data Wrangler has been available for awhile now, but I’ll be honest, it’s not something we’ve been actively using. We’ve been heads down on time-sensitive projects for over a year and needless to say, our cup runneth over. Recently we’ve had a bit of respite and I decided to see how we could use Data Wrangler within the context of our current Microsoft Fabric data warehouse (i.e. medallion layer lakehouses).

Data Wrangler has a lot of cool features that will give you code snippets for what you want to do, but I wanted to use it a different way. I wanted to have an easy way to do a quick check for dimension tables. I also wanted an easy-peasy way for others, some of whom are not developers, to be able to do quick sanity check of the data.

Click through to see how it works.

Leave a Comment

Tracking Record Changes in SQL

Andy Brownsword builds a hash key:

The issue: there was no indicator of which records had been modified and as a result the process took way too long, and downstream reporting wasn’t available on time.

After reviewing and stepping through the process it became clear that the vast majority of data didn’t change. This was a daily process handling 12 months of data, yet over 99% had no changes at all. However the process ingested the whole dataset (~250m records) and processed it in SQL.

Click through for an architectural-level discussion. In practice, HASHBYTES() works really well, especially when you use CONCAT() or CONCAT_WS() to put together the columns you care about

Leave a Comment

Comparing {targets} in R to dbt for Data Engineering

Jonathan Carroll compares two approaches:

Thinking of a real-world project I could take for a spin, I decided to build some ingestion for my personal finances. I’ve used Quickbooks previously which connects up to my bank and helps categorise personal and business (as a freelance contractor) expenses. I decided I’ll build my own ‘slowbooks’ processing workflow based on some manual exports (I don’t think my bank has an API).

Both of the approaches I’ll compare here build on the idea of a Makefile which connects up commands to run based on dependencies, and only runs what is needed; if all the input dependencies of a step have not changed, there’s no need to re-run that step. From what I understand, you could largely get away with just writing some Makefiles (or the newer implementation just (just.systems)) but these two approaches help to better structure how that’s constructed.

Read on for Jonathan’s discovery process and ultimate findings. H/T R-Bloggers.

Comments closed

Building Materialized Lake Views in Microsoft Fabric

Nikola Ilic presses the Easy button:

For the longest time, building a medallion architecture in Microsoft Fabric meant stitching together a small orchestra of moving parts: notebooks for the transformations, pipelines for orchestration, schedules for refresh, custom code for data quality checks, and the Monitor Hub for keeping an eye on whether anything actually worked. Every layer worked – until something didn’t, and then you had to figure out which layer broke, why, and which downstream layers got affected along the way.

If you’ve ever tried to debug a silver layer that didn’t update because the bronze notebook failed three hours ago, you know exactly what I’m talking about.

Then, at FabCon Atlanta in March 2026, materialized lake views (MLVs) went generally available. And the story they’re telling is simple: what if your entire medallion pipeline could be a few SELECT statements?

Let me walk you through the whole thing – what they are, how they work, what changed between preview and GA, and where they fit (and where they don’t) in your architecture.

Read on for that walkthrough.

Comments closed