Press "Enter" to skip to content

Category: Data Lake

Lakehouse Management in Fabric via mssparkutils

Sandeep Pawar scripts out some lakehouse work:

At MS Ignite, Microsoft unveiled a variety of new APIs designed for working with Fabric items, such as workspaces, Spark jobs, lakehouses, warehouses, ML items, and more. You can find detailed information about these APIs here. These APIs will be critical in the automation and CI/CD of Fabric workloads.

With the release of these APIs, a new method has been added to the mssparkutils library to simplify working with lakehouses. In this blog, I will explore the available options and provide examples. Please note that at the time of writing this blog, the information has not been published on the official documentation page, so keep an eye on the documentation for changes.

This looks to be quite useful for CI/CD work.

Leave a Comment

Fun with Tables in the Microsoft Fabric Lakehouse

Nikola Ilic dives into tables:

Probably the biggest confusion is: should I use a lakehouse or warehouse in Fabric? Or, what is the difference between Direct Lake and DirectQuery mode for Power BI reports?

And, while these two points mentioned above are of paramount importance to clarify, in this article I’ll focus on explaining another potential caveat, which is relevant when working with the lakehouse in Microsoft Fabric.

If only Nikola dove onto tables, I could make him an honorary Buffalo Bills fan.

Comments closed

Exporting Dynamics 365 Data into Delta Lake via Synapse Link

Jose Mendes performs a data migration:

It’s fair to say there have been some considerable changes in the Azure landscape over recent years.

This blog will show you how to configure Synapse Link to export D365 data in the Delta Lake format – an open-source data and transaction storage file format used in Lakehouse implementations.

Before you start considering using this approach, you will need to ensure you meet the following prerequisites (Microsoft documentation).

Read on for those prerequisites as well as a step-by-step guide on how to do it.

Comments closed

Storing Log Analytics Data in the Microsoft Fabric Lakehouse

Gilbert Quevauvilliers needs a place to store this data:

Following on in my series, in this blog post I am going to use the dataflow Gen2 in Microsoft Fabric to load the data into a lake house table.

By doing this, it will allow me to store the data in a delta lake table.

In this series I am going to show you all the steps I did to have the successful outcome I had with my client.

Click through for links to the first two parts of the series, as well as a step-by-step guide for part 3.

Comments closed

Accessing OneLake Files from Power BI Desktop

Marc Lelijveld reads a file:

Fabric content is all over the place by now. In Fabric, as a SaaS platform, most (if not all) services have interconnectivity. In a few clicks you connect your web-developed Power BI dataset to a lakehouse, or warehouse to fetch data from OneLake. But what about Power BI Desktop? You might have uploaded some files to OneLake which you cannot access from Power BI Desktop.

In this blog I’ll explain on how you can connect to OneLake data using Power BI Desktop!

This turns out to be a bit trickier than I would have expected. Hopefully the experience gets better over time.

Comments closed

Data Lake Serving Layers

James Serra has layers, like an onion:

Data lakes typically have three layers: raw, cleaned, and presentation (also called bronze, silver, and gold if using the medallion architecture popularized by Databricks). I talk about this is my prior blog post on Data lake architecture. Many times, companies will create a fourth layer outside of the data lake that I call the relational serving layer. I’ve been having conversations recently with companies about the need for another type of fourth layer, which I will call the physical serving layer. In this blog post I’ll discuss the relational serving layer and the physical serving layer.

Read on to learn more about these.

Comments closed

The Lakehouse is (Still) Not Enough

Nikola Ilic needs more than a lakehouse:

In the previous parts of the Data Modeling for mere mortals series, we examined traditional approaches to data modeling, with focus on dimensional modeling and Star schema importance for business intelligence scenarios. Now, it’s time to introduce the concept of the modern data platform.

As usual, let’s take a more tool-agnostic approach and learn about some of the key characteristics of the modern data estate. Please, don’t mind if I use some of the latest buzzwords related to this topic, but I promise to reduce their usage as much as possible. 

Lakehouses are getting closer to being good enough, but the performance needs to be there, especially if you eventually have virtual data warehouses sitting on top of lakehouse data to deal with the need for structured fact-dimensional data for reporting tools.

Comments closed