Press "Enter" to skip to content

Category: Microsoft Fabric

An Overview of Spark in Microsoft Fabric

Reza Rad gives people a primer on Apache Spark:

Microsoft Fabric runs some workloads under the Spark engine, but what is it really? In this article, I’ll take you through the question of what Spark is, What benefits it has, how it is associated with Fabric, what configurations you have, and other things you need to know about it.

Reza talks a bit about history, interaction with languages, etc. As a quick addition to the languages list, you can use .NET languages like F# and C# with Spark, though it does involve setting up dotnet/spark and there are some open questions about its future. And I’m not even sure you could get it to work with Microsoft Fabric.

Comments closed

Bringing SQL Server Data into Microsoft Fabric

Nikola Ilic shows us the current options:

Options, options, options…Having the possibility to perform a certain task in multiple different ways is usually a great “problem” to have, although very often not each option is equally effective. And, Microsoft Fabric is all about “options”…You want to ingest the data? No problem, you can use notebooks, pipelines, Dataflows, or T-SQL. Data transformation needed? No worries at all – again, you may leverage notebooks, T-SQL, Dataflows…Data processing, you asked? Lakehouse (Spark), Warehouse (SQL), Real-Time Intelligence (KQL), Power BI…The choice is yours again.

In a nutshell, almost every single task in Microsoft Fabric can be completed in multiple ways, and there is no “right” or “wrong” tool, as long as it gets the job done (of course, as efficiently as possible).

Nikola lays out two pre-requisites and then shows us two options we can currently use, and three potential options we currently cannot use.

Comments closed

Dynamic Warehouse and Lakehouse Connections in Data Pipelines

Koen Verbeeck doesn’t want to hard-code the connection string:

When you develop data pipelines in Microsoft Fabric (the Azure Data Factory equivalent in Fabric, not to be confused with deployment pipelines), you will most likely have some activities with a connection to a warehouse, a lakehouse or a KQL database (for the remainder of the blog post I’ll talk about a warehouse, but it can be any of those three data stores). For example, in a Script, Lookup, or Copy activity. When you deploy your data pipeline to another workspace – using, you might’ve guessed it, deployment pipelines – the pipeline itself is copied to the other workspace. E.g., we deploy a pipeline from the development workspace to the test workspace.

Read on to see what this means for warehouse connections and how you can work around the existing messiness.

Comments closed

Build a Custom Semantic Model for Microsoft Fabric

Reza Rad offers up some advice:

The Lakehouse or Warehouse comes with a default Power BI Sematic model, which can be used for reporting and analytics. However, you can also build and use a customized semantic model. There are significant differences when using the semantic model in real-world analytics projects. In this article, I’ll explain the difference between these two, which one is recommended, and why.

Click through for the video, as well as the article.

Comments closed

Microsoft Fabric: Lakehouse or Warehouse?

Koen Verbeeck helps us choose:

This doesn’t mean no code has to be written. On the contrary, in this article we’re going to focus on two services of Fabric: the lakehouse and the warehouse. The first one is part of the Data Engineering experience in Fabric, while the latter is part of the Data Warehousing experience. Both require code to be written to create any sort of artefact. In the warehouse we can use T-SQL to create tables, load data into them and do any kind of transformation. In the lakehouse, we use notebooks to work with data, typically in languages such as PySpark or Spark SQL.

Read on for the comparison. I tend to go more for the lakehouse experience rather than warehouse, but Koen provides a lot of the info you’d need in order to make the right decision for yourself.

Comments closed

Dataverse and Microsoft Fabric Gotchas

Marc Lelijveld shares some advice:

Recently, I architected a solution for a client for their Microsoft Fabric data platform. The client works with Dynamics Finance & Operations as one of their main ERP system. Fabric offers easy ways to bring data from various standard Microsoft services into the platform, however it is not always as easy as it looks like. In this blog I will elaborate on the gotcha’s encountered in architecting this solution.

Read on for the challenges that Marc ran into along the way.

Comments closed

Stop and Start Fabric via Power Automate

Gilbert Quevauvilliers saves some money:

Stop and start your Fabric Capacity using Power Automate

With Fabric Capacities trial coming to an end, you need to make sure to stop and start Fabric Capacities.

In my blog post below, I am going to show you how I can start or stop my Fabric Capacity by simply sending an email to myself with the details in the Subject Line to start or stop the capacity.

That’s a pretty neat method, especially if you have odd hours you want to run the capacity.

Comments closed

System Views and Distributed Processing in Microsoft Fabric

Koen Verbeeck runs into an annoying error:

I have a metadata-driven ELT framework that heavily relies on dynamic SQL to generate SQL statements that load data from views into a respective fact or dimension. Such a task is well suited for generation, since the pattern to load a type 1 SCD, type 2 SCD or a fact table is always the same.

To read the metadata of the views, I use a couple of systems views, such as sys.views and sys.sql_modules. At some point, I join this metadata (containing info about the various columns and their data types) against metadata of my own (for example, what is the business key of this dimension). This all works fine in Azure SQL DB or SQL Server, but in my Fabric warehouse I was greeted with the following error:

The query references an object that is not supported in distributed processing mode.

Read on to learn more about why you get this error and one workaround for it.

Comments closed

Getting the Top N Results in a PySpark Notebook

Gilbert Quevauvilliers only needs the top 1:

How to get the TopN rows using Python in Fabric Notebooks

When working with data there are sometimes weird and wonderful requirements which must be created in order to get to the desired solution.

In today’s blog post I had a situation where I wanted to get a single row with the highest duration.

Gilbert uses the Spark SQL version, specifically the Python function variant. You could also use Spark SQL and write a query using the LIMIT operator.

Comments closed