Press "Enter" to skip to content

Category: ETL / ELT

Configuring Microsoft Fabric Data Mirroring for Snowflake

Koen Verbeeck copies some data:

We have a couple of Snowflake databases and would like to have that data available in Microsoft Fabric as well. Is there an easy solution to get the data quickly in Fabric? We don’t have many technical people on staff, so writing complex ETL is not an option.

Read on for more information on how it works. Mind you, you’re probably still writing the T and some of the L after using mirroring.

Comments closed

Debugging Failed Function Calls in ADF

Andy Brownsword troubleshoots a problem:

I recently ran into an issue when trying to call a function from an ADF pipeline. The function returned a generic Internal Server Error with no details exposed. Here we’ll look at how to dig into the logs to identify the true cause of the failure.

In this instance the function was performing PGP encryption but this could apply to any function. Let’s start with the problem.

Click through for the very generic error message and how you can get the real details.

Comments closed

Bringing SQL Server Data into Microsoft Fabric

Nikola Ilic shows us the current options:

Options, options, options…Having the possibility to perform a certain task in multiple different ways is usually a great “problem” to have, although very often not each option is equally effective. And, Microsoft Fabric is all about “options”…You want to ingest the data? No problem, you can use notebooks, pipelines, Dataflows, or T-SQL. Data transformation needed? No worries at all – again, you may leverage notebooks, T-SQL, Dataflows…Data processing, you asked? Lakehouse (Spark), Warehouse (SQL), Real-Time Intelligence (KQL), Power BI…The choice is yours again.

In a nutshell, almost every single task in Microsoft Fabric can be completed in multiple ways, and there is no “right” or “wrong” tool, as long as it gets the job done (of course, as efficiently as possible).

Nikola lays out two pre-requisites and then shows us two options we can currently use, and three potential options we currently cannot use.

Comments closed

Dynamic Warehouse and Lakehouse Connections in Data Pipelines

Koen Verbeeck doesn’t want to hard-code the connection string:

When you develop data pipelines in Microsoft Fabric (the Azure Data Factory equivalent in Fabric, not to be confused with deployment pipelines), you will most likely have some activities with a connection to a warehouse, a lakehouse or a KQL database (for the remainder of the blog post I’ll talk about a warehouse, but it can be any of those three data stores). For example, in a Script, Lookup, or Copy activity. When you deploy your data pipeline to another workspace – using, you might’ve guessed it, deployment pipelines – the pipeline itself is copied to the other workspace. E.g., we deploy a pipeline from the development workspace to the test workspace.

Read on to see what this means for warehouse connections and how you can work around the existing messiness.

Comments closed

Dataverse and Microsoft Fabric Gotchas

Marc Lelijveld shares some advice:

Recently, I architected a solution for a client for their Microsoft Fabric data platform. The client works with Dynamics Finance & Operations as one of their main ERP system. Fabric offers easy ways to bring data from various standard Microsoft services into the platform, however it is not always as easy as it looks like. In this blog I will elaborate on the gotcha’s encountered in architecting this solution.

Read on for the challenges that Marc ran into along the way.

Comments closed

Invoking a Fabric Data Factory Pipeline via REST API

Andy Leonard makes a call:

This post is current as of 30 May 2024. There are other posts by fantastic bloggers about how to use the Fabric REST API. Fabric development is progressing so fast, some of those posts are less up-to-date. Make no mistake, this post will most likely not age well, and for the very same reason. That’s ok. We bloggers live to serve. I, like all the rest, will endeavor to persevere – and we will all write more posts, Lord willing.

In this post, I share one way to invoke Fabric Data Factory pipelines using the REST API.
I will be using the web version of Postman to call REST API methods.
You can sign up for a free Postman account. Since it’s free, I encourage you to check the box to receive news and offers from them. As I mentioned in an earlier post, you can always unsubscribe if the messages are unhelpful or if they get too “chatty.”

Read on for that way.

Comments closed

Recapping an Orchestration Framework

Martin Schoombee wraps up a series:

Frameworks are extremely useful when they are thoughtfully designed and implemented. I have seen both sides of the coin, but what I probably see the most of is a lack of any sort of framework. What I typically see are some naming conventions and coding standards, but many companies miss the opportunity to take it one step further and reduce the inefficiencies of repetitive tasks. There’s a ton of repetition in ETL processes, and in my opinion that gives us a really good opportunity to streamline the way in which we are doing things with a well designed framework.

Read on for Martin’s notes to keep in mind, as well as where to go from here.

Comments closed

Orchestration Controllers in Azure Data Factory

Martin Schoombee gets to the top of the pyramid:

Controllers are pipelines that initiate the execution of a single process or task in a specific order and with constraints. Whereas everything else in this framework is pretty automated, this part is entirely manual.

Why? Well, when I started thinking about the design of this framework I knew I needed something at the “highest level” that would execute an entire daily ETL process, or a modified ETL process that only loads specific data during the day. I wanted to maximize the flexibility of the framework, and that either meant adding another level to the metadata structure or creating this layer of pipelines that sit at the top. I opted for the second, because I did not feel it was worth the complexity of adding another layer into the metadata structure. That being said, it doesn’t mean it cannot or shouldn’t be done…it was a personal choice I made to keep things as simple as I could.

Read on to learn more about what the controller should look like and how the other pieces fit in.

Comments closed

Modern Data Warehousing with Data Lake Storage and Azure Data Factory

Josephine Bush continues a series on modern data warehousing:

In today’s data-driven world, having the right tools to manage and process large datasets is crucial. That’s where Azure Data Lake Storage (ADLS) and Azure Data Factory (ADF) come in handy, making it easier than ever to store and transform your data. In this post, I’ll show you how to set up ADLS to store your Parquet files and configure ADF to manage your data flows efficiently.

Read on for an overview of both technologies.

Comments closed

Creating Orchestrators in Azure Data Factory

Martin Schoombee continues a series on building an orchestration framework in Azure Data Factory:

The orchestration layer of the framework is where all the magic happens. It facilitates the execution of processes and/or tasks as defined in the metadata, and needs to do it both seamlessly and efficiently. Ideally you would want to deploy this layer only once, and never have to touch it again. And it is really with that in mind that I designed this layer…to function independently and with minimal dependencies in both directions.

I would have loved for this layer to consist of only one pipeline but there are some nuances in Data Factory that make it impossible, the primary nuance being that you cannot nest ForEach activities. As a result, this layer contains three pipelines that will be covered by the sections below in more detail.

Read on to see what those three pipelines are.

Comments closed