Press "Enter" to skip to content

Category: Microsoft Fabric

A Review of Fabric Lakehouse

Teo Lachev talks lakehouses:

The Microsoft’s Lakehouse definition is less ambitious and exclusive. “Microsoft Fabric Lakehouse is a data architecture platform for storing, managing, and analyzing structured and unstructured data in a single location. It is a flexible and scalable solution that allows organizations to handle large volumes of data using a variety of tools and frameworks to process and analyze that data. It integrates with other data management and analytics tools to provide a comprehensive solution for data engineering and analytics”. In other words, a lakehouse is whatever you want it to be if you want something better than a data lake.

Read on for Teo’s classic The Good, The Bad, and The Ugly format.

Comments closed

A Primer on Microsoft Fabric Notebooks

Leila Etaati provides an explanation:

In Fabric, there are tools for different personas of the users to work with. For example, for a citizen data analyst, Dataflows and Power BI Datasets are the tools with which the analyst can build the data model. For Data Engineers and Scientists, one of the tools is Notebook.

The Notebook is a place to write and run codes in languages such as; PySpark (Python), Spark (Scala), Spark SQL, and SparkR (R). These languages are usually familiar languages for data engineers and data scientists. The Notebook provides an editor to write code in these languages, run it in the same place, and see the results. Consider this as the coding tool for the data engineer and scientist.

Click through for a video, as well as a regular blog post.

Comments closed

Verti-Parquet and DirectLake in Fabric

Jordan Witcombe provides an explanation:

The VertiPaq engine cleverly uses columnar storage for efficient querying and processing. It employs multiple compression techniques, including Run-Length Encoding (RLE) and Dictionary Encoding, to minimise storage space. Through finding optimal sort orders and value encoding, it achieves maximum space efficiency and performance. VertiPaq also utilises ‘In-Memory Column Store’ for fast query performance, ‘Predicate Pushdown’ to eliminate unnecessary data at query time, and ‘Block Decompression’ to only decompress relevant data blocks, making it a powerhouse for data management and retrieval.

Now, because of these ingenious tricks, we wave goodbye to traditional file formats like JSON or CSV. Instead, all data stored within the managed area of Fabric and OneLake uses either Parquet or Delta. It’s time to embrace these efficient, high-performing formats that bring the best out of VertiPaq’s compressive power. Let’s explore these further in the next section.

Read on for some comparisons in file size between Fabric and Databricks, as well as how they perform in Power BI.

Comments closed

Connecting to a Fabric Warehouse via SSMS

Reitse Eskens does some digging:

Whilst working on a blogpost on Fabric Data Warehouse, I started wondering if I could work around the SQL web interface and connect to my OneLake with SSMS and/or ADS. As it turns out, you can!

Specifically, you can connect to see things in a warehouse or the Tables view of a lakehouse, not the Files view. There is a built-in web viewer, but Microsoft Fabric definitely is intended to work with normal SQL tools, not just its web interface and Power BI.

Comments closed

Creating a Microsoft Fabric Environment

Kevin Chant gets at it:

In reality, there are a few different ways to join the Microsoft Fabric (Preview) trial.

For example, you might be lucky enough to have it enabled in the workplace already. However, there are ways that you can create your own Microsoft Fabric environment as well.

Click through for the process, and note that the trial is 60 days, though Microsoft will let you renew the trial until the product goes GA.

Comments closed

Contrasting Lakehouse, Warehouse, and Datamart in Fabric & Power BI

Reza Rad disambiguates three terms:

Three types of objects in the Microsoft Fabric have similarities in what they can do for an analytics system. These three are; Lakehouse, Data Warehouse, and Power BI Datamart. All three objects provide storage for the data, which can be loaded into them using an ETL process and read using something like a Power BI report. In this article and video, I’ll explain the actual differences and how to choose the best option for your implementation and architecture.

Reza does a good job explaining when each of the three fit in and even has a nice chart to work out which one you might want to use.

Comments closed

Roles and Domains in Microsoft Fabric

Marc Lelijveld explains two key concepts:

Microsoft Fabric is out there for a few weeks now. With the release of Fabric, a new concept in line with data-mesh architectures became available in Fabric, or Power BI if you will. With the introduction of Domains, we have a new level of controls added next to existing roles. In this blog I will further elaborate on the levels of control that are available today and provide a clear overview of these different levels.

There’s going to be a bit of nomenclature adjustment for people who have spent most of their time in Synapse or other platforms moving to Fabric. If you’ve already spent most of your time in Power BI, this shift is probably a little easier.

Comments closed

Licensing for Microsoft Fabric

Reza Rad explains how licensing of Microsoft Fabric will work:

To understand the licensing for Microsoft Fabric, You first need to understand the Capacity structure. In Fabric, there are three important sections that the content can be organized into those; Tenant, Capacity, and Workspace.

Tenant is the most fundamental part of the structure of Fabric. Each domain can have one or multiple tenants.

The capacity is the substructure under the tenant. You can have one or multiple capacities in each tenant. Each capacity is a pool of resources that can be used for Microsoft Fabric services. There are different SKUs for different levels of resources. I’ll explain the pricing and SKUs shortly after.

Inside capacities, you will have workspaces. Workspaces are sharing units that will be used for developers and users. For example, you will create Lakehouse, Data Pipeline, and Dataflow inside a workspace, and you can share them with the rest of the developer team. A workspace is assigned to a capacity. However, you can have more than one capacity associated with one workspace. The screenshot below shows how Tenant, Capaicy, and Workspace work together.

Read on to understand at what level billing occurs, what the options are, and what it means. My gut is saying that F8 is probably the lowest acceptable tier for a real company’s production environment and F2 is more for dev environments or people trying things out. But we’ll know more, I think, in the next few months as people try things out.

Comments closed

Configuring Compliance in Microsoft Fabric

Kevin Chant checks a box:

Compliance is a very important aspect when working for data. Especially when you must work to standards like PCI-DSS. With this in mind I looked into the compliance story for Microsoft Fabric.

By the end of this post, you will have a better idea of how to test configuring compliance for Microsoft Fabric. Along the way I share plenty of links.

Read on for step-by-step instructions, as well as those links.

Comments closed

Thoughts on Fabric OneLake

Teo Lachev shares some thoughts:

In a previous post, I shared my overall impression of Fabric. In this post, I’ll continue exploring Fabric, this time sharing my thoughts on OneLake. If you need a quick intro to Fabric OneLake, the Josh Caplan’s “Build 2023: Eliminate data silos with OneLake, the OneDrive for Data” presentation provides a great overview of OneLake, its capabilities, and the vision behind it from a Microsoft perspective. If you prefer a shorter narrative, you can find it in the “Microsoft OneLake in Fabric, the OneDrive for data” post. As always, we are all learning and constructive criticism would be appreciated if I missed or misinterpreted something.

I think some of Teo’s criticism comes from the idea that OneLake should also mean one lakehouse or one data lake, but the abstraction is one level higher than that. I would like to see some of Teo’s ideas make it into GA, though.

Comments closed