Press "Enter" to skip to content

Category: Microsoft Fabric

ML Models and Data Warehouses in Microsoft Fabric

Tomaz Kastrun continues a series on Microsoft Fabric. First up is creating ML models:

Protip: Both experiments and the ML model version look similar, and you can intuitively switch between both of them. But do not get confused, as the ML Model version applies the best-selected model from the experiment and can be used for inference.

Then we switch context to data warehousing:

Today we will start exploring the Fabric Data Warehouse.

With the data lake-centric logic, the data warehouse in Fabric is built on a distributed processing engine, that enables automated scaling. The SaaS experience creates a segway to easier analysis and reporting, and at the same time gives the ability to run heavy workloads against open data format, simply by using transact SQL (T-SQL). Microsoft OneLake gives all the services to hold a single copy of data and can be consumed in a data warehouse, datalake or SQL Analytics.

Comments closed

Creating Charts in Microsoft Fabric Notebooks using Vega

Phil Seamark tries out Vega in a Microsoft Fabric notebook:

I recently needed to generate a quick visual inside a Microsoft Fabric notebook. After a little internet searching, I found there are many good quality charting libraries in Python, however it was going to take too long to figure out how to create a very specific type of chart.

This is where Vega came to the rescue. The purpose of this short article is to share a very simple implementation of generating a Vega chart using a Microsoft Fabric notebook.

Click through for the example code.

Comments closed

Continuing the Advent of Fabric

Tomaz Kastrun has been busy. On day 9, we build a custom environment:

Microsoft Fabric provides you with the capability to create a new environment, where you can select different Spark runtimes, configure your compute resources, and create a list of Python libraries (public or custom; from Conda or PyPI) to be installed. Custom environments behave the same way as any other environment and can be used and attached to your notebook or used on a workspace. Custom environments can also be attached to Spark job definitions.

On day 10, we have Spark job definitions:

An Apache Spark job definition is a single computational action, that is normally scheduled and triggered. In Microsoft Fabric (same as in Synapse), you could submit batch/streaming jobs to Spark clusters.

By uploading a binary file, or libraries in any of the languages (Java / Scala, R, Python), you can run any kind of logic (transformation, cleaning, ingest, ingress, …) to the data that is hosted and server to your lakehouse.

Day 11 introduces us to data science in Fabric:

We have looked into creating the lakehouse, checked the delta lake and delta tables, got some data into the lakehouse, and created a custom environment and Spark job definition. And now we need to see, how to start working with the data.

Day 12 builds an experiment:

We have started working with the data and now, we would like to create and submit the experiment. In this case, MLFlow will be used here.

Create a new experiment and give it a name. I have named my “Advent2023_Experiment_v3”.

Click through to catch up with Tomaz.

Comments closed

Microsoft Fabric and Dataverse Link

Teo Lachev sees some potential:

I might have identified at last a good case for Microsoft Fabric, but I’ll be in a better position to confirm after a POC with larger datasets. Dynamics Online, aka Dynamics 365, epitomizes the customer’s struggle to export their data hosted in SaaS cloud offerings for analytics or other purposes. Since unfortunately Microsoft doesn’t provide direct access to the Dynamics native storage (Azure SQL Database), which often could be the simplest and fastest solution, Dynamics has “compensated” throughout the years by introducing and sunsetting various options:

Click through for that table, as well as Teo’s thoughts on the possibility of Dataverse Link to Fabric as a viable solution.

Comments closed

Microsoft Fabric SQL Endpoints and REST API

Tomaz Kastrun continues a series on Microsoft Fabric. Day 6 covers the SQL Analytics endpoint:

SQL Analytics endpoint in lakehouse is a SQL-based experience for lakehouse delta tables. By using standard T-SQL language, you can write queries to analyze data in delta tables, create functions, procedures, views and even apply security over the objects. There are some of the functionalities missing from your standard T-SQL language, but the experience is the same.

Besides the SQL experience, you can always use the corresponding items in the workspace view of Lakehouse explorer, use SQL in notebooks, or simply use SQL analytics endpoint

Day 7 looks at what subset of T-SQL syntax you can use against SQL Analytics endpoints:

You get the gist, and there are some other limitations; computed columns, indexed views, any kind of indexes, partitioned tables, triggers, user-defined types, sparse columns, surrogate keys, temporary tables and many more. Essentially, all the commands that are not supported in distributed processing mode.

The most biggest annoyance (!) is case sensitivity! Ughh.. This proves that the SQL operates like API on top of delta tables, which is translated either into PySpark commands or not directly to Spark since Spark is not case-sensitive. So, the first one will work and the second statement will be gracefully terminated.

Day 8 covers the Lakehouse REST API:

Now that we explored the lakehouse through the interface and workspaces, let’s check today, how can we use REST API. Microsoft Fabric Rest API defines a unified endpoint for operations.

Comments closed

Security in Microsoft Fabric

Alex Lisboa-Wright talks security:

In Fabric, the basic logical structure for data services is the Workspace, in which users can create Items, which are the various resources that perform all the data operations available in Fabric, such as Lakehouses, pipelines, machine learning models and so on. Each workspace is a self-contained data storage and development environment, whose user access is controlled by both workspace admins and member users. User access controls include options to manage users’ workspace roles, which determine the permissions assigned to each user. Security permissions can be managed on the workspace and item levels in the Fabric UI. MEID authentication can also be employed within Fabric, as connecting Fabric items to other Azure resources requires MEID. MEID’s Conditional Access feature can also be configured for use in Fabric (see this documentation for best practice for Fabric resources linking to other Azure services).

Read on to learn more. Fabric is a broad set of tools and technologies, so security is both important and definitely non-trivial, even when you consider that it is a software-as-a-service offering and therefore doesn’t have much going on with user-facing networking or infrastructure security.

Comments closed

An Overview of Lakehouses in Microsoft Fabric

Kevin Chant invites you to a swank lakhouse:

By the end of this post, you will have a good overview of Microsoft Fabric Data Lakehouses, including CI/CD options. In addition, where your SQL Server background can prove to be useful and where some Power BI knowledge can come in handy.

Plus, I share plenty of links in this post. For instance, there are a couple of useful resources to help you get started towards the bottom of this post.

Click through for the article.

Comments closed

Refreshing Individual Tables and Partitions using Semantic Link

Sandeep Pawar doesn’t have time to refresh everything:

The latest version of Semantic Link (0.4.0) has many methods that provide a convenient abstraction for calling Fabric/Power BI REST APIs. You can see them here. In this blog, I will show how to use the .refresh_dataset() which uses the Enhanced Refresh API to refresh Power BI semantic models, tables and partitions.

Read on for two ways to do it.

Comments closed

All about Lakehouses in Microsoft Fabric

Tomaz Kastrun gives us the skinny with multiple posts in his Advent of Microsoft Fabric. Day 3 introduces the lakehouse:

Lakehouse is cost-effective and optimised storage, supporting all types of data and file formats, structured and unstructured data, and helps you govern the data, giving you better data governance. With optimised and concurrent reads and writes, it gives outstanding performance by also reducing data movement and minimising redundant copy operations. Furthermore, it gives you a user-friendly multitasking experience in UI with retaining your context, not losing your running operations and working on multiple things, without accidentally stopping others.

Day 4 covers Delta format:

Yesterday we looked into lakehouse and learned that Delta tables are the storing format. So, let’s explore what and how we can go around understanding and working with delta tables. But first we must understand delta lake.

Day 5 covers data ingest:

We have learned about delta lake and delta tables. But since we have uploaded the file directly, let’s explore, how we can also get the data into lakehouse.

Click through for all three posts.

Comments closed