Press "Enter" to skip to content

Category: ETL / ELT

Maestro Now 1.1.0

Will Hipson lays out an update:

maestro has officially graduated to stable release with version 1.0.0 back in January 2026 and now its latest version 1.1.0. This marks a commitment to maintaining a stable API and increased reliance on using maestro in production. In our environment alone, maestro has orchestrated millions of pipeline executions over the course of a year, effectively making it the heartbeat of our entire data stack.

If you haven’t heard of maestro, it’s a pipeline orchestration package. You can learn more about it here.

Click through to see what’s changed between the 1.0.0 release and now. H/T R-Bloggers.

Leave a Comment

Shortcut Transformations now GA in Microsoft Fabric

Pernal Shah transforms some data:

Organizations today manage data across multiple storage systems, often in formats like CSV, Parquet, and JSON. While this data is readily available, turning it into analytics-ready tables typically requires building and maintaining complex ETL pipelines.

Shortcut transformations remove that complexity.

With Shortcut transformations, you can convert structured files referenced through OneLake shortcuts into Delta tables without building pipelines or writing code.

This currently works for CSV, Parquet, and JSON data and does cut out a very common step for raw-layer transformation.

Leave a Comment

Apache Airflow Jobs in Fabric Data Factory

Mark Kromer makes an announcement:

The world of data integration is rapidly evolving, and staying up to date with the latest technologies is crucial for organizations seeking to make the most of their data assets. Available now are the newest innovations in Fabric Data Factory pipelines and Apache Airflow job orchestration, designed to empower data engineers, architects, and analytics professionals with greater efficiency, flexibility, and scalability.

Read on to see what’s newly available, including some preview functionality.

Comments closed

Generating Excel Reports via Fabric Dataflows Gen2

Chris Webb builds a report:

So many cool Fabric features get announced at Fabcon that it’s easy to miss some of them. The fact that you can now not only generate Excel files from Fabric Dataflows Gen2, but that you have so much control over the format that you can use this feature to build simple reports rather than plain old data dumps, is a great example: it was only mentioned halfway through this blog post on new stuff in Dataflows Gen2 Nonethless it was the Fabcon feature announcement that got me most excited. This is because it shows how Fabric Dataflows Gen2 have gone beyond being just a way to bring data into Fabric and are now a proper self-service ETL tool where you can extract data from a lot of different sources, transform it using Power Query, and load it to a variety of destinations both inside Fabric and outside it (such as CSV files, Snowflake and yes, Excel).

Click through for an example.

Comments closed

Batch versus Stream for Data Processing

Nikola Ilic answers a question and then the follow-up question:

If you’ve spent any time in the data engineering world, you’ve likely encountered this debate at least once. Maybe twice. Ok, probably a dozen times “Should we process our data in batches or in real-time?” And if you’re anything like me, you’ve noticed that the answer usually starts with: “Well, it depends…”

Which is true. It does depend. But “it depends” is only useful if you actually know what it depends on. And that’s the gap I want to fill with this article. Not another theoretical comparison of batch vs. stream processing (I hope you already know the basics). Instead, I want to give you a practical framework for deciding which approach makes sense for your specific scenario, and then show you how both paths look when implemented in Microsoft Fabric.

Read on to learn why both are viable patterns and how you can work with both in Microsoft Fabric.

Comments closed

dbt and Microsoft Fabric

Pradeep Srikakolapu and Abhishek Narain dig into dbt:

Modern analytics teams are adopting open, SQL-first data transformation, robust CI/CD and governance, and seamless integration across lakehouse and warehouse platforms. dbt is now the standard for analytics engineering, while Microsoft Fabric unifies data engineering, science, warehousing, and BI in OneLake.

By investing in dbt + Microsoft Fabric integration, Microsoft empowers customers with a unified, enterprise-grade analytics platform that supports native dbt workflows—future-proofing analytics engineering on Fabric.

I’ll be interested to see if this retains corporate investment longer than some of their open-source collaborations. That’s been a consistent issue over the years: announce some neat integrations with a popular technology, release a couple of versions, and then quietly deprecate it a year or two later. This sounds like it’s less likely to end up in that boat, simply based on how the Fabric team is collaborating compared to, say, the various Spark on .NET efforts over the years.

Comments closed

Query Folding and Staging in Fabric Dataflows Gen2

Chris Webb goes digging:

A few years ago I wrote this post on the subject of staging in Fabric Dataflows Gen2. In it I explained what staging is, how you can enable it for a query inside a Dataflow, and discussed the pros and cons of using it. However one thing I never got round to doing until this week is looking at how you can tell if query folding is happening on staged data inside a Dataflow – which turns out to be harder to do than you might think.

Read on to learn more, and also check out the comment describing an alternative approach to part of Chris’s solution.

Comments closed

Partitioned Compute and Fabric Dataflow Performance

Chris Webb performs a test:

Partitioned Compute is a new feature in Fabric Dataflows that allows you to run certain operations inside a Dataflow query in parallel and therefore improve performance. While UI support is limited at the moment it can be used in any Dataflow by adding a single line of fairly simple M code and checking a box in the Options dialog. But as with a lot of performance optimisation features (and this is particularly true of Dataflows) it can sometimes result in worse performance rather than better performance – you need to know how and when to use it. And so, in order to understand when this feature should and shouldn’t be used, I decided to do some tests and share the results here.

Click through for the test, the result, and an open door for subsequent analysis.

Comments closed

Microsoft Fabric Mirroring and SQL Server 2025

Meagan Longoria takes a peek at mirroring in Microsoft Fabric:

Mirroring of SQL Server databases in Microsoft Fabric was first released in public preview in March 2024. Mirrored databases promise near-real-time replication without the need to manage and orchestrate pipelines, copy jobs, or notebooks. John Sterrett blogged about them last year here. But since that initial release, the mechanism under the hood has evolved significantly.

Read on to see how this behaves for versions of SQL Server prior to 2025, and how it changes in 2025.

Comments closed