Press "Enter" to skip to content

Category: ETL / ELT

Preview-Only Steps in Microsoft Fabric Dataflows

Chris Webb covers a new feature:

I have been spending a lot of time recently investigating the new performance-related features that have rolled out in Fabric Dataflows over the last few months, so expect a lot of blog posts on this subject in the near future. Probably my favourite of these features is Preview-Only steps: they make such a big difference to my quality of life as a Dataflows developer.

The basic idea (which you can read about in the very detailed docs here) is that you can add steps to a query inside a Dataflow that are only executed when you are editing the query and looking at data in the preview pane; when the Dataflow is refreshed these steps are ignored. This means you can do things like add filters, remove columns or summarise data while you’re editing the Dataflow in order to make the performance of the editor faster or debug data problems. It’s all very straightforward and works well.

First up, that feature is pretty interesting, though I could see things break if you only do your testing in the preview pane. Second, what Chris does with this is quite interesting.

Comments closed

Troubleshooting Bad Request in ADF Pipelines

Koen Verbeeck said something bad:

A while ago I blogged about a use case where a pipeline fails during debugging with a BadRequest error, even though it validates successfully. If you’re wondering, this is the helpful error message that you get:

Click through for an image of the 400 Bad Request message, how Koen fixed it originally, and then a different scenario in which that 400 message popped up.

Ultimately, a 400 Bad Request comes down to “You sent me information that doesn’t make sense and I can’t fulfill your request, so fix it, dummy.” 400 status codes are very rude and insulting. Especially 418–that thing has a mouth like a sailor’s.

Comments closed

Creating Fabric Linked Service Parameters for ADO Deployment

Koen Verbeeck glues together several technologies:

Quite the title, so let me set the stage first. You have an Azure Data Factory instance (or Azure Synapse Pipelines) and you have a couple of linked services that point to Fabric artifacts such as a lakehouse or a warehouse. You want to deploy your ADF instance with an Azure Devops build/release pipeline to another environment (e.g. acceptance or production) and this means the linked services need to change as well because in those environments the lakehouse or warehouse are in a different workspace (and also have different object Ids).

When you want to deploy ADF, you typically use the ARM template that ADF automatically creates when you publish (when your instance is linked with a git repo). More information about this setup can be found in the documentation. To parameterize certain properties of a linked service, you can use custom parameterization of the ARM template. Anyway, long story short, I tried to parameterize the properties of the Fabric linked service. 

Read on to see how that went, as well as what you need to do to solve this issue.

Comments closed

Alerting People in Microsoft Teams from Data Factory Pipelines

Andy Brownsword sends a message:

Whether running Data Factory, Synapse, or Fabric pipelines, things go wrong – and the de facto response is to send an email. We’ve looked at sending emails from pipelines before, but at scale they can become noise and are easy to ignore.

A more effective option is to surface alerts where collaboration already exists, such as Teams.

In this post we’re going to start looking at using Teams and consolidate notifications into a channel. This functionality gives team members visibility, the ability to update in threads, and the option to tag people for a tighter response loop than typical emails bring.

Click through for the process.

Comments closed

Working with Recent Data in Dataflows Gen2

Penny Zhou sees recent datasets:

How much time do you spend navigating to the same data sources when building dataflows? Data preparation is an iterative process—you often return to the same sources as you refine your dataflows, add new transformations, or create similar workflows. If you find yourself repeatedly connecting to the same tables, files, or databases, the Recent data module in Dataflow Gen2 is designed for you. This feature reduces friction by providing quick access to your most frequently used data items, letting you focus on the transformation logic rather than navigation.

Click through to see how you can access the Recent data menu and what it includes.

Comments closed

Snapshot Reporting in Microsoft Fabric via Fabric Pipelines

Kenneth Omorodion builds a Dataflows Gen2 pipeline:

In a previous tip, I described how we can implement snapshot reporting using Microsoft Fabric Dataflow Gen2. In this article, I will describe how to achieve the same using Microsoft Fabric Pipelines. I previously described how important snapshot reporting can be in Business Intelligence reporting. Some reasons why developers/engineers might prefer to leverage a Fabric pipeline instead of a Dataflow Gen 2 include considerations around cost efficiency and data volumes.

My strong preference is still to do this in code (notebooks, Spark jobs), but at least Dataflows Gen2 aren’t literally 100x slower than the alternatives anymore.

Comments closed

Using a Microsoft Fabric Variable Library in a Dataflow

Laura Graham-Brown shows another way to use variable libraries:

One of the popular low-code tools within Microsoft Fabric is the Gen2 Dataflow. Power BI report builders already know some Power Query. So armed with this knowledge is a popular starting point to load data into Microsoft Fabric. Adding values from the Variable Library in a Dataflow is an obvious plan to make it more future proof and to work better with Deployment pipelines.

I will confess the first time I tried these I could not get them to work till I read the instructions correctly. So they do work just understand the limitations!

To be fair, following instructions is one of the most challenging things to do, it seems.

Comments closed

Near-Real-TIme Reporting on SQL Server Data with Microsoft Fabric

Rebecca Lewis continues a series on Microsoft Fabric:

You already know the options. Run heavy reporting queries against production. eewgh. Or stand up a reporting replica, build ETL to keep it current, maintain a refresh schedule, and hope nothing breaks on a holiday weekend. It works, but it’s expensive and has an awful lot of moving pieces.

Fabric gives you a third path: continuously replicate your SQL Server data into OneLake using Fabric Mirroring, and let Power BI read it using Direct Lake mode. Your SQL Server stays focused on OLTP and your reporting runs against a near real-time copy in Fabric. No pipelines. No refresh schedules. Nice.

Read on for the options available with Microsoft Fabric, as well as an endearing note that “real-time” isn’t.

Comments closed