Press "Enter" to skip to content

Category: ETL / ELT

Using Fabric Data Wrangler for Testing

Kristina Mishra checks out some data:

Data Wrangler has been available for awhile now, but I’ll be honest, it’s not something we’ve been actively using. We’ve been heads down on time-sensitive projects for over a year and needless to say, our cup runneth over. Recently we’ve had a bit of respite and I decided to see how we could use Data Wrangler within the context of our current Microsoft Fabric data warehouse (i.e. medallion layer lakehouses).

Data Wrangler has a lot of cool features that will give you code snippets for what you want to do, but I wanted to use it a different way. I wanted to have an easy way to do a quick check for dimension tables. I also wanted an easy-peasy way for others, some of whom are not developers, to be able to do quick sanity check of the data.

Click through to see how it works.

Leave a Comment

Tracking Record Changes in SQL

Andy Brownsword builds a hash key:

The issue: there was no indicator of which records had been modified and as a result the process took way too long, and downstream reporting wasn’t available on time.

After reviewing and stepping through the process it became clear that the vast majority of data didn’t change. This was a daily process handling 12 months of data, yet over 99% had no changes at all. However the process ingested the whole dataset (~250m records) and processed it in SQL.

Click through for an architectural-level discussion. In practice, HASHBYTES() works really well, especially when you use CONCAT() or CONCAT_WS() to put together the columns you care about

Leave a Comment

Comparing {targets} in R to dbt for Data Engineering

Jonathan Carroll compares two approaches:

Thinking of a real-world project I could take for a spin, I decided to build some ingestion for my personal finances. I’ve used Quickbooks previously which connects up to my bank and helps categorise personal and business (as a freelance contractor) expenses. I decided I’ll build my own ‘slowbooks’ processing workflow based on some manual exports (I don’t think my bank has an API).

Both of the approaches I’ll compare here build on the idea of a Makefile which connects up commands to run based on dependencies, and only runs what is needed; if all the input dependencies of a step have not changed, there’s no need to re-run that step. From what I understand, you could largely get away with just writing some Makefiles (or the newer implementation just (just.systems)) but these two approaches help to better structure how that’s constructed.

Read on for Jonathan’s discovery process and ultimate findings. H/T R-Bloggers.

Leave a Comment

Building Materialized Lake Views in Microsoft Fabric

Nikola Ilic presses the Easy button:

For the longest time, building a medallion architecture in Microsoft Fabric meant stitching together a small orchestra of moving parts: notebooks for the transformations, pipelines for orchestration, schedules for refresh, custom code for data quality checks, and the Monitor Hub for keeping an eye on whether anything actually worked. Every layer worked – until something didn’t, and then you had to figure out which layer broke, why, and which downstream layers got affected along the way.

If you’ve ever tried to debug a silver layer that didn’t update because the bronze notebook failed three hours ago, you know exactly what I’m talking about.

Then, at FabCon Atlanta in March 2026, materialized lake views (MLVs) went generally available. And the story they’re telling is simple: what if your entire medallion pipeline could be a few SELECT statements?

Let me walk you through the whole thing – what they are, how they work, what changed between preview and GA, and where they fit (and where they don’t) in your architecture.

Read on for that walkthrough.

Leave a Comment

Maestro Now 1.1.0

Will Hipson lays out an update:

maestro has officially graduated to stable release with version 1.0.0 back in January 2026 and now its latest version 1.1.0. This marks a commitment to maintaining a stable API and increased reliance on using maestro in production. In our environment alone, maestro has orchestrated millions of pipeline executions over the course of a year, effectively making it the heartbeat of our entire data stack.

If you haven’t heard of maestro, it’s a pipeline orchestration package. You can learn more about it here.

Click through to see what’s changed between the 1.0.0 release and now. H/T R-Bloggers.

Leave a Comment

Shortcut Transformations now GA in Microsoft Fabric

Pernal Shah transforms some data:

Organizations today manage data across multiple storage systems, often in formats like CSV, Parquet, and JSON. While this data is readily available, turning it into analytics-ready tables typically requires building and maintaining complex ETL pipelines.

Shortcut transformations remove that complexity.

With Shortcut transformations, you can convert structured files referenced through OneLake shortcuts into Delta tables without building pipelines or writing code.

This currently works for CSV, Parquet, and JSON data and does cut out a very common step for raw-layer transformation.

Comments closed

Apache Airflow Jobs in Fabric Data Factory

Mark Kromer makes an announcement:

The world of data integration is rapidly evolving, and staying up to date with the latest technologies is crucial for organizations seeking to make the most of their data assets. Available now are the newest innovations in Fabric Data Factory pipelines and Apache Airflow job orchestration, designed to empower data engineers, architects, and analytics professionals with greater efficiency, flexibility, and scalability.

Read on to see what’s newly available, including some preview functionality.

Comments closed

Generating Excel Reports via Fabric Dataflows Gen2

Chris Webb builds a report:

So many cool Fabric features get announced at Fabcon that it’s easy to miss some of them. The fact that you can now not only generate Excel files from Fabric Dataflows Gen2, but that you have so much control over the format that you can use this feature to build simple reports rather than plain old data dumps, is a great example: it was only mentioned halfway through this blog post on new stuff in Dataflows Gen2 Nonethless it was the Fabcon feature announcement that got me most excited. This is because it shows how Fabric Dataflows Gen2 have gone beyond being just a way to bring data into Fabric and are now a proper self-service ETL tool where you can extract data from a lot of different sources, transform it using Power Query, and load it to a variety of destinations both inside Fabric and outside it (such as CSV files, Snowflake and yes, Excel).

Click through for an example.

Comments closed

Batch versus Stream for Data Processing

Nikola Ilic answers a question and then the follow-up question:

If you’ve spent any time in the data engineering world, you’ve likely encountered this debate at least once. Maybe twice. Ok, probably a dozen times “Should we process our data in batches or in real-time?” And if you’re anything like me, you’ve noticed that the answer usually starts with: “Well, it depends…”

Which is true. It does depend. But “it depends” is only useful if you actually know what it depends on. And that’s the gap I want to fill with this article. Not another theoretical comparison of batch vs. stream processing (I hope you already know the basics). Instead, I want to give you a practical framework for deciding which approach makes sense for your specific scenario, and then show you how both paths look when implemented in Microsoft Fabric.

Read on to learn why both are viable patterns and how you can work with both in Microsoft Fabric.

Comments closed

dbt and Microsoft Fabric

Pradeep Srikakolapu and Abhishek Narain dig into dbt:

Modern analytics teams are adopting open, SQL-first data transformation, robust CI/CD and governance, and seamless integration across lakehouse and warehouse platforms. dbt is now the standard for analytics engineering, while Microsoft Fabric unifies data engineering, science, warehousing, and BI in OneLake.

By investing in dbt + Microsoft Fabric integration, Microsoft empowers customers with a unified, enterprise-grade analytics platform that supports native dbt workflows—future-proofing analytics engineering on Fabric.

I’ll be interested to see if this retains corporate investment longer than some of their open-source collaborations. That’s been a consistent issue over the years: announce some neat integrations with a popular technology, release a couple of versions, and then quietly deprecate it a year or two later. This sounds like it’s less likely to end up in that boat, simply based on how the Fabric team is collaborating compared to, say, the various Spark on .NET efforts over the years.

Comments closed