Data Lake – Page 4 – Curated SQL

Microsoft Fabric Direct Lake and Reframing Operations

Published 2024-09-10 by Kevin Feasel

Power BI offers a new type of connection to Microsoft Fabric Lakehouse or Warehouse, called Direct Lake. The Direct Lake connection acts like DirectQuery and won’t need the data to be refreshed. However, the Power BI semantic model has refresh settings that can be turned on or off. In this article and video, you will learn about the Refresh settings for the Power BI semantic model that is connected using a Direct Lake connection, what that is, and why it is called Reframe.

Read on to learn more, or to check out the video.

Comments closed

Finding Columns in Memory in Power BI Direct Lake Mode

Published 2024-08-20 by Kevin Feasel

Chris Webb goes searching:

As you probably know, in Power BI Direct Lake mode column data is only loaded into memory when it is needed by a query. I gave a few examples of this – and how to monitor it using DMVs – in this blog post from last year. But which columns are loaded into memory in which circumstances? I was thinking about this recently and realised I didn’t know for sure, so I decided to do some tests. Some of the results were obvious, some were a surprise.

Read on for the answer.

Comments closed

Building Real-Time Dashboards from Lakehouse Data in Microsoft Fabric

Published 2024-08-09 by Kevin Feasel

Dennes Torres gets around a limitation:

Real-Time dashboards are a great feature in Real Time Intelligence experience to monitor our data. However, by default it’s made to work only with Kusto Databases. The options to create a real time dashboard or to define its data source only accept Kusto Databases.

What if we would like to see in real time the information we have in a lakehouse as well? Let’s discover a solution for this.

Read on for the solution.

Comments closed

Microsoft Fabric Lakehouse Access Control

Published 2024-08-07 by Kevin Feasel

Koen Verbeeck lets us into the lakehouse:

We’re doing a proof-of-concept with Microsoft Fabric, building our data model in a lakehouse. We’d like to give people access to the data inside it so they can build their own reports with whatever tool they want. Is there an easy way to share access to a lakehouse (preferably not by giving access to the entire workspace)?

Read on to learn how.

Comments closed

Reading a Lakehouse Table from another Microsoft Fabric Workspace

Published 2024-08-07 by Kevin Feasel

Gilbert Quevauvilliers spans the gap:

I was doing some work recently for a customer and they had data stored in different Lakehouse’s which was in a different App Workspace.

I was pleasantly surprised that this can be quite easy to do.

In my example below I am going to show you how in my notebook I can read a table in a Lakehouse table when it is not attached to any Lakehouse.

It’s good that this is so easy to do, considering that current advice leans toward having multiple workspaces and not cramming everything into one.

Comments closed

Defining the Default Lakehouse for a Fabric Notebook

Published 2024-07-26 by Kevin Feasel

Sandeep Pawar sets up a default lakehouse:

I wrote a blog post a while ago on mounting a lakehouse (or generally speaking a storage location) to all nodes in a Fabric spark notebook. This allows you to use the File API file path from the mounted lakehouse.

Mounting a lakehouse using mssparkutils.fs.mount() doesn’t define the default lakehouse of a notebook. To do so, you can use the configure magic as below:

Read on for that command, as well as some notes around using it.

Comments closed

Microsoft Fabric: Lakehouse or Warehouse?

Published 2024-06-12 by Kevin Feasel

Koen Verbeeck helps us choose:

This doesn’t mean no code has to be written. On the contrary, in this article we’re going to focus on two services of Fabric: the lakehouse and the warehouse. The first one is part of the Data Engineering experience in Fabric, while the latter is part of the Data Warehousing experience. Both require code to be written to create any sort of artefact. In the warehouse we can use T-SQL to create tables, load data into them and do any kind of transformation. In the lakehouse, we use notebooks to work with data, typically in languages such as PySpark or Spark SQL.

Read on for the comparison. I tend to go more for the lakehouse experience rather than warehouse, but Koen provides a lot of the info you’d need in order to make the right decision for yourself.

Comments closed

Comparing Microsoft Fabric Warehouse and Lakehouse Performance

Published 2024-06-03 by Kevin Feasel

Reitse Eskens busts out the stopwatch:

I just can’t seem to stop doing this, checking the limits of Microsoft Fabric. In this instalment I’ll try and find some limits on the data warehouse experience and compare them with the Lakehouse experience. The data warehouse is a bit different compared to the Lakehouse, so I’ll be digging into that one first. Then I’m going to load data into the warehouse with a copy data pipeline followed by some big queries to test performance. The Fabric Capacity App will be used to check out the capacity necessary (or used for that matter).

As usual, I’m using the F2 capacity as it’s the one that should break the easiest. It’s also the cheapest one to run tests against and, as the capacity calculation isn’t dependent on the SKU (Stock Keeping Unit), you can easily translate to find out which capacity SKU will fit the workload. Remember that your workload will differ from the one shown in this blog. These tests are a comparison between the different offerings, something you could do for yourself. These blogs are a bit of a happy place as every option will get a good chance. In your work, your skills (and those of your co-workers) will be a major driver towards an option. Even if this offers the chance to learn something new!

Reitse focuses on ingesting and transforming data and the results were quite interesting.

Comments closed

Modern Data Warehousing with Data Lake Storage and Azure Data Factory

Published 2024-05-21 by Kevin Feasel

Josephine Bush continues a series on modern data warehousing:

In today’s data-driven world, having the right tools to manage and process large datasets is crucial. That’s where Azure Data Lake Storage (ADLS) and Azure Data Factory (ADF) come in handy, making it easier than ever to store and transform your data. In this post, I’ll show you how to set up ADLS to store your Parquet files and configure ADF to manage your data flows efficiently.

Read on for an overview of both technologies.

Comments closed

Understanding the Delta Lake Format

Published 2024-04-30 by Kevin Feasel

Reza Rad has a new post and video combo:

Please don’t get lost in the terminology pit regarding analytics. You have probably heard of Lake Structure, Data Lake, Lakehouse, Delta Tables, and Delta Lake. They all sound the same! Of course, I am not here to talk about all of them; I am here to explain what Delta Lake is.

Delta Lake is an open-source standard for Apache Spark workloads (and a few others). It is not specific to Microsoft; other vendors are using it, too. This open-source standard format stores table data in a way that can be beneficial for many purposes.

In other words, when you create a table in a Lakehouse in Fabric, the underlying structure of files and folders for that table is stored in a structure (or we can call it format) called Delta Lake.

Read on to learn more about this open standard and how it all fits together with Microsoft Fabric.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Category: Data Lake