Data Lake – Page 3 – Curated SQL

Building Real-Time Dashboards from Lakehouse Data in Microsoft Fabric

Published 2024-08-09 by Kevin Feasel

Real-Time dashboards are a great feature in Real Time Intelligence experience to monitor our data. However, by default it’s made to work only with Kusto Databases. The options to create a real time dashboard or to define its data source only accept Kusto Databases.

What if we would like to see in real time the information we have in a lakehouse as well? Let’s discover a solution for this.

Read on for the solution.

Comments closed

Microsoft Fabric Lakehouse Access Control

Published 2024-08-07 by Kevin Feasel

Koen Verbeeck lets us into the lakehouse:

We’re doing a proof-of-concept with Microsoft Fabric, building our data model in a lakehouse. We’d like to give people access to the data inside it so they can build their own reports with whatever tool they want. Is there an easy way to share access to a lakehouse (preferably not by giving access to the entire workspace)?

Read on to learn how.

Comments closed

Reading a Lakehouse Table from another Microsoft Fabric Workspace

Published 2024-08-07 by Kevin Feasel

Gilbert Quevauvilliers spans the gap:

I was doing some work recently for a customer and they had data stored in different Lakehouse’s which was in a different App Workspace.

I was pleasantly surprised that this can be quite easy to do.

In my example below I am going to show you how in my notebook I can read a table in a Lakehouse table when it is not attached to any Lakehouse.

It’s good that this is so easy to do, considering that current advice leans toward having multiple workspaces and not cramming everything into one.

Comments closed

Defining the Default Lakehouse for a Fabric Notebook

Published 2024-07-26 by Kevin Feasel

Sandeep Pawar sets up a default lakehouse:

I wrote a blog post a while ago on mounting a lakehouse (or generally speaking a storage location) to all nodes in a Fabric spark notebook. This allows you to use the File API file path from the mounted lakehouse.

Mounting a lakehouse using mssparkutils.fs.mount() doesn’t define the default lakehouse of a notebook. To do so, you can use the configure magic as below:

Read on for that command, as well as some notes around using it.

Comments closed

Microsoft Fabric: Lakehouse or Warehouse?

Published 2024-06-12 by Kevin Feasel

Koen Verbeeck helps us choose:

This doesn’t mean no code has to be written. On the contrary, in this article we’re going to focus on two services of Fabric: the lakehouse and the warehouse. The first one is part of the Data Engineering experience in Fabric, while the latter is part of the Data Warehousing experience. Both require code to be written to create any sort of artefact. In the warehouse we can use T-SQL to create tables, load data into them and do any kind of transformation. In the lakehouse, we use notebooks to work with data, typically in languages such as PySpark or Spark SQL.

Read on for the comparison. I tend to go more for the lakehouse experience rather than warehouse, but Koen provides a lot of the info you’d need in order to make the right decision for yourself.

Comments closed

Comparing Microsoft Fabric Warehouse and Lakehouse Performance

Published 2024-06-03 by Kevin Feasel

Reitse Eskens busts out the stopwatch:

I just can’t seem to stop doing this, checking the limits of Microsoft Fabric. In this instalment I’ll try and find some limits on the data warehouse experience and compare them with the Lakehouse experience. The data warehouse is a bit different compared to the Lakehouse, so I’ll be digging into that one first. Then I’m going to load data into the warehouse with a copy data pipeline followed by some big queries to test performance. The Fabric Capacity App will be used to check out the capacity necessary (or used for that matter).

As usual, I’m using the F2 capacity as it’s the one that should break the easiest. It’s also the cheapest one to run tests against and, as the capacity calculation isn’t dependent on the SKU (Stock Keeping Unit), you can easily translate to find out which capacity SKU will fit the workload. Remember that your workload will differ from the one shown in this blog. These tests are a comparison between the different offerings, something you could do for yourself. These blogs are a bit of a happy place as every option will get a good chance. In your work, your skills (and those of your co-workers) will be a major driver towards an option. Even if this offers the chance to learn something new!

Reitse focuses on ingesting and transforming data and the results were quite interesting.

Comments closed

Modern Data Warehousing with Data Lake Storage and Azure Data Factory

Published 2024-05-21 by Kevin Feasel

Josephine Bush continues a series on modern data warehousing:

In today’s data-driven world, having the right tools to manage and process large datasets is crucial. That’s where Azure Data Lake Storage (ADLS) and Azure Data Factory (ADF) come in handy, making it easier than ever to store and transform your data. In this post, I’ll show you how to set up ADLS to store your Parquet files and configure ADF to manage your data flows efficiently.

Read on for an overview of both technologies.

Comments closed

Understanding the Delta Lake Format

Published 2024-04-30 by Kevin Feasel

Reza Rad has a new post and video combo:

Please don’t get lost in the terminology pit regarding analytics. You have probably heard of Lake Structure, Data Lake, Lakehouse, Delta Tables, and Delta Lake. They all sound the same! Of course, I am not here to talk about all of them; I am here to explain what Delta Lake is.

Delta Lake is an open-source standard for Apache Spark workloads (and a few others). It is not specific to Microsoft; other vendors are using it, too. This open-source standard format stores table data in a way that can be beneficial for many purposes.

In other words, when you create a table in a Lakehouse in Fabric, the underlying structure of files and folders for that table is stored in a structure (or we can call it format) called Delta Lake.

Read on to learn more about this open standard and how it all fits together with Microsoft Fabric.

Comments closed

Mirroring Azure SQL DB into Microsoft Fabric

Published 2024-04-12 by Kevin Feasel

Dennes Torres holds up a mirror:

You need to read data from production to build a single source of truth. If you create pipelines reading directly from production, you will create additional load over the production environment. The mirror allows you to do much of the production reporting from the mirror, leaving the production environment to serve other users. Keep in mind, production report, but not analytics report.

Mirroring a production database to Fabric is one method to ensure the load over production will be as low as possible and the data will be transferred fabric to complete the transformations from this point.

Only this? What about avoiding pipeline creation? Not really, you still need to create pipelines, as I will explain ahead.

Click through for the demo and explanation. This is an important thing for people to note: mirroring doesn’t eliminate ELT. You still have the data lake process to work through, as your transactional system does not and should not look like your reporting system.

Comments closed

Comparing Direct Lake and Import Mode in Power BI

Published 2024-04-08 by Kevin Feasel

Marco Russo compares and contrasts:

What is the right choice between Direct Lake and Import mode in Power BI?

At SQLBI, we do not publish content until we have had enough time to experiment with and collect data about new features. We usually don’t break the news unless we have enough time to test a feature in preview and consider the released service solid enough. This did not happen with Direct Lake, and we have not published any Direct Lake content yet, but it seemed not urgent for reasons we will see soon. However, the feedback collected from many attendees of SqlBits 2024 and the first Microsoft Fabric Conference raised the alarm: too many people have an incorrect perception of Direct Lake, which should be addressed as soon as possible to avoid architectural mistakes and unnecessary additional implementation costs.

Click through for the tl;dr version, followed by the explanations.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Data Lake