Press "Enter" to skip to content

Category: ETL / ELT

Enabling Staging for Microsoft Fabric Dataflows

Chris Webb shares some thoughts:

If you read this post that was published on the Fabric blog back in July, you’ll know that each Power Query query in a Fabric Gen2 dataflow has a property that determines whether its output is staged or not – where “staged” means that the output is written to the (soon-to-be hidden) Lakehouse linked to the dataflow, regardless of whether you have set a destination for the query output to be written to. Turning this on or off can have a big impact on your refresh times, making them a lot faster or a lot slower.

Chris shares a simple example of when staging might not be reasonable. This is going to be the less common scenario, however.

Comments closed

Storing Log Analytics Data in the Microsoft Fabric Lakehouse

Gilbert Quevauvilliers needs a place to store this data:

Following on in my series, in this blog post I am going to use the dataflow Gen2 in Microsoft Fabric to load the data into a lake house table.

By doing this, it will allow me to store the data in a delta lake table.

In this series I am going to show you all the steps I did to have the successful outcome I had with my client.

Click through for links to the first two parts of the series, as well as a step-by-step guide for part 3.

Comments closed

A Brief Overview of 21 ETL Tools in Python

Adron Hall makes a list:

Here are summaries of each of the tools you’ve mentioned along with examples of how to implement the ETL (Extract, Transform, Load) process using each tool within a Python workflow:

  1. Apache Spark: Apache Spark is a powerful open-source cluster-computing framework that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. It’s commonly used for processing large-scale data and running complex ETL pipelines. Example Implementation:

Read on for summaries and samples for each of the 21 options.

Comments closed

Storing Log Analytics Queries in Azure Blob Storage

Gilbert Quevauvilliers wants some long-term storage:

Following on in my series, in this blog post I am going to demonstrate how to store Log Analytics Queries in Blob Storage.

This allows me to be able to store the Power BI Queries externally from Log Analytics and to have an easy way to get the data into my Fabric Lake house in later steps. To do this I am going to use a Logic App in Azure.

In this series I am going to show you all the steps I did to have the successful outcome I had with my client.

Read on to see what Gilbert used for the task.

Comments closed

Lessons Learned from Azure Data Factory Integrating with DB/2 on Mainframe

Teo Lachev shares some thoughts:

I’ve done a few BI integration projects extracting data from ERPs running on IBM Db2. Most of the implementations would use a hybrid architecture where the ERP would be running on an on-prem mainframe while the data was loaded in Microsoft Azure. Here are a few tips if you’re facing this challenge:

Click through for five major points. Surprisingly, one of them isn’t “Avoid DB/2 like the plague.”

Comments closed

Diving into the Microsoft Fabric Copy Activity

Reza Rad does more than copies:

Copy Activity is one of the most commonly used activities in Microsoft Fabric’s Data Factory Pipeline. The Copy Activity copies the data from a source to a destination. However, there is more to that rather than just a simple copy. In this article, you will learn what Copy Activity is, its rationale, how it works, and its configuration options.

Reza has a video, as well as a demo-heavy full-length article on the topic.

Comments closed

Batch File Importation in SQL Server

Paul White loads things quickly:

All this can be achieved with client-side tools and programming. It can also be done server-side by importing the raw data into a staging table before processing using T-SQL procedures.

Other times, the need arises to ingest data without using client-side tools and without making a complete copy of the raw data on the server. This article describes one possible approach in that situation.

Read on for the process.

Comments closed

Using the Microsoft Fabric Data Gateway

Reitse Eskens uploads some data:

In a blog from a few weeks ago, I wrote about getting data from your on-prem SQL Server into Fabric. At the time, the only option for a copy dataflow was using a direct connection over the internet. It still is, but now you can also use the PowerBI Data Gateway to get data from your SQL Server into Fabric.

In this blog, I’ll take you through the steps needed and an issue I ran into.

Read on for Reitse’s instructions and how to avoid the issue he ran into.

Comments closed

Migrating Cosmos DB Tables API

Eitan Blumin handles a migration:

A few months ago, I was involved in an interesting project where a large customer (not to be named due to NDA) needed to migrate their entire Azure cloud subscription to another subscription. This was a difficult and arduous process that involved several PaaS technologies, besides SQL Server, that I didn’t have experience with before.

But it presented very interesting challenges and opportunities to learn new things.

One of these was the need to migrate an entire Azure Cosmos DB with Table Storage API account from one subscription to another.

Read on for the challenge, the intermediate solution using the Cosmos DB Data Migration Tool, and Eitan’s Powershell script to automate the process. I know and work with most of the people working on the DMT and they’re good folks.

Comments closed