Press "Enter" to skip to content

Category: Microsoft Fabric

Microsoft Fabric: Lakehouse or Warehouse?

Koen Verbeeck helps us choose:

This doesn’t mean no code has to be written. On the contrary, in this article we’re going to focus on two services of Fabric: the lakehouse and the warehouse. The first one is part of the Data Engineering experience in Fabric, while the latter is part of the Data Warehousing experience. Both require code to be written to create any sort of artefact. In the warehouse we can use T-SQL to create tables, load data into them and do any kind of transformation. In the lakehouse, we use notebooks to work with data, typically in languages such as PySpark or Spark SQL.

Read on for the comparison. I tend to go more for the lakehouse experience rather than warehouse, but Koen provides a lot of the info you’d need in order to make the right decision for yourself.

Comments closed

Dataverse and Microsoft Fabric Gotchas

Marc Lelijveld shares some advice:

Recently, I architected a solution for a client for their Microsoft Fabric data platform. The client works with Dynamics Finance & Operations as one of their main ERP system. Fabric offers easy ways to bring data from various standard Microsoft services into the platform, however it is not always as easy as it looks like. In this blog I will elaborate on the gotcha’s encountered in architecting this solution.

Read on for the challenges that Marc ran into along the way.

Comments closed

Stop and Start Fabric via Power Automate

Gilbert Quevauvilliers saves some money:

Stop and start your Fabric Capacity using Power Automate

With Fabric Capacities trial coming to an end, you need to make sure to stop and start Fabric Capacities.

In my blog post below, I am going to show you how I can start or stop my Fabric Capacity by simply sending an email to myself with the details in the Subject Line to start or stop the capacity.

That’s a pretty neat method, especially if you have odd hours you want to run the capacity.

Comments closed

System Views and Distributed Processing in Microsoft Fabric

Koen Verbeeck runs into an annoying error:

I have a metadata-driven ELT framework that heavily relies on dynamic SQL to generate SQL statements that load data from views into a respective fact or dimension. Such a task is well suited for generation, since the pattern to load a type 1 SCD, type 2 SCD or a fact table is always the same.

To read the metadata of the views, I use a couple of systems views, such as sys.views and sys.sql_modules. At some point, I join this metadata (containing info about the various columns and their data types) against metadata of my own (for example, what is the business key of this dimension). This all works fine in Azure SQL DB or SQL Server, but in my Fabric warehouse I was greeted with the following error:

The query references an object that is not supported in distributed processing mode.

Read on to learn more about why you get this error and one workaround for it.

Comments closed

Getting the Top N Results in a PySpark Notebook

Gilbert Quevauvilliers only needs the top 1:

How to get the TopN rows using Python in Fabric Notebooks

When working with data there are sometimes weird and wonderful requirements which must be created in order to get to the desired solution.

In today’s blog post I had a situation where I wanted to get a single row with the highest duration.

Gilbert uses the Spark SQL version, specifically the Python function variant. You could also use Spark SQL and write a query using the LIMIT operator.

Comments closed

Comparing Microsoft Fabric Warehouse and Lakehouse Performance

Reitse Eskens busts out the stopwatch:

I just can’t seem to stop doing this, checking the limits of Microsoft Fabric. In this instalment I’ll try and find some limits on the data warehouse experience and compare them with the Lakehouse experience. The data warehouse is a bit different compared to the Lakehouse, so I’ll be digging into that one first. Then I’m going to load data into the warehouse with a copy data pipeline followed by some big queries to test performance. The Fabric Capacity App will be used to check out the capacity necessary (or used for that matter).

As usual, I’m using the F2 capacity as it’s the one that should break the easiest. It’s also the cheapest one to run tests against and, as the capacity calculation isn’t dependent on the SKU (Stock Keeping Unit), you can easily translate to find out which capacity SKU will fit the workload. Remember that your workload will differ from the one shown in this blog. These tests are a comparison between the different offerings, something you could do for yourself. These blogs are a bit of a happy place as every option will get a good chance. In your work, your skills (and those of your co-workers) will be a major driver towards an option. Even if this offers the chance to learn something new!

Reitse focuses on ingesting and transforming data and the results were quite interesting.

Comments closed

Reviewing the Microsoft Fabric Roadmap

Paul Turley has a report:

I created this report to summarize the release status for all Fabric features that are documented in the Microsoft Fabric Roadmap. The information on this report is collected and updated frequently from the Fabric Roadmap hosted by Microsoft, and displays status information in a convenient, single-page Power BI report. Each feature area on the report has links to the detail documentation on the official roadmap site. For convenience, I’ve shortened the report path to TinyURL.com/FabricRoadmapReport.

Check out that link to see the report.

Comments closed

Dynamic Historical Partition Refresh in Power BI

Marc Lelijveld digs into partition refreshing:

I’ve heard the question pretty often from customers: “You told me to use incremental refresh, but how can I regularly run a full load or refresh onder partitions?” Well, there are perfect ways to do this using Tabular Editor or SQL Server Management Studio. But this often includes manual work to trigger the processing.

Today, this question was asked again to me. I thought, there should be a smarter way to do this. Since I recently explored more in the wonderful world of Fabric Notebooks and Python, decided to dive a bit deeper in this world and see if it is possible to script something like this using Semantic Link. And obviously, the answer is “Yes!”

Read on to learn how to do it with a bit of Python and Microsoft Fabric’s Semantic Link library (sempy).

Comments closed

Looping through Data in Microsoft Fabric PySpark Notebooks

Gilbert Quevauvilliers builds a loop:

Continuing with my existing blog series on what I’m learning with notebooks and PySpark.

Today, I’m going to explain to you how I found a way to loop through data in a notebook.

In this example, I’m going to show you how I loop through a range of dates, which can then be used in a subsequent query to extract data by passing through each date into a DAX query.

Click through for Gilbert’s example. Here’s an alternative using something called a list comprehension. First, build a function that does what you want to do—that’d be the innards of Gilbert’s Python code, lines 31-54.

def perform_dax_query(row):
    var_Date = row["Date"]
    ...
    display(df_DAX_QueryResult)

Then, call that function for each row:

[perform_dax_query(row) for row in data_collect]

In this particular scenario, I’d personally stick with Gilbert’s composition, but in cases where you’re transforming a list of elements into a new list—for example, if you’re performing some data cleanup for each row in a list and you want the output to be a new list with cleaned-up data—then the list comprehension works really well.

Comments closed

An Introduction to the Native Execution Engine in Microsoft Fabric

Sandeep Pawar gives us the gentle version:

At MS Build, Microsoft announced availability of Native Execution Engine in Fabric. The product team will have detailed documentation and technical details but I will attempt to provide an ELI5 version of what it means and how it works at a 30000 ft level.

Read on to learn what the Native Execution Engine is and why Microsoft would use it versus running everything in the Java Virtual Machine like other services in the broader Hadoop ecosystem have historically done (at least at the beginning, before they built their own native execution engines!).

Comments closed