Press "Enter" to skip to content

Author: Kevin Feasel

On-Demand Loading and Direct Lake in Power BI

Chris Webb gives us the beginnings of an origin story:

For any Power BI person, Direct Lake mode is the killer feature of Fabric. Import mode report performance (or near enough) direct on data from the lake, with none of the waiting around for data to refresh! It seems too good to be true. How can it be possible?

The full answer, going into detail about how data from Delta tables is transcoded to Power BI’s in-memory format, is too long for one blog post. But in part it is possible through something that existed before Fabric but which didn’t gain much attention: on-demand loading. 

Click through for another blog post on the topic and an idea of how these tie together.

Comments closed

An Overview of Data Modeling

Nikola Ilic provides an overview of data modeling:

In recent years, I’ve done dozens of training on various data platform topics, for all kinds of audiences. When teaching various data platform concepts and techniques, I find one of the concepts particularly intimidating for many business analysts, especially those who are just starting their journey. And, that is the concept of data modeling.

This is a good introduction and does a particularly good job of explaining why we have logical and physical data models. I have one medium-sized quibble with an otherwise-great article: 3rd Normal Form is nowhere near sufficient for a logical data model, and I’d make the strong case (in fact, I do make that case) that 5th Normal Form should be the standard and that 3NF is an anachronism which you should entirely replace with Boyce-Codd Normal Form.

Comments closed

Loading Data from On-Premises SQL Server into Microsoft Fabric

Reitse Eskens spends an hour or so:

In my previous blogs, I’ve written about Fabric and all the cool things it can do. Thing is, my load tests were based on files. Either CSV or Delta. But in reality, a lot of data comes from an on-premises database server. In reality, you might connect to a SQL 2008 instance or maybe even older. Truth be told, I haven’t got an instance in that version/edition around anymore. So I had to use SQL Server 2019, a version I’m encountering more often nowadays.

For this blog, it won’t make much sense to create a humongous database and try to get all the data in. Fabric will cope, the major issue (in my experience) is the internet connection between my local database and the Fabric environment. One thing I’m really curious about is if Fabric will have the Link capability that was introduced for Synapse Analytics and SQL Server 2022.

There’s no Link capability currently available, so Reitse does the next-best thing and uses Fabric pipelines.

Comments closed

Events in Apache Kafka

Lucia Cerchie explains the key metaphor in an event-driven system:

We can speak broadly, maybe even a little philosophically, about what events are. Events are “things that happen,” or sometimes, they are otherwise defined as representations of facts. All data is, in a way, a result of humans trying to grok events. At the same time, I honestly don’t find the definition helpful if we leave it at this level. Do we ever design apps around things that don’t happen? (Don’t think about that too hard.) 

Let’s get concrete: Events that might affect real-time data pipelines and applications, including things like Pinterest saves, USPS address changes, ship coordinate updates, and credit card transactions. 

Read on for more about events and how they drive the design of Kafka topics and the applications which use them.

Comments closed

Rolling Correlation in R

Steven Sanderson tries out a function:

In the world of data analysis, time-series data is a common sight. Whether it’s stock prices, weather patterns, or website traffic, understanding the relationship between variables over time is crucial. One valuable technique in this domain is calculating rolling correlation, which allows us to examine the evolving correlation between two variables as our data moves through time. In this blog post, we will delve into the rollapply function and its capabilities, exploring its applications through a series of practical examples. So, let’s get started!

Click through for an example of how it works.

Comments closed

Thoughts on Fabric Data Wrangler

Gilbert Quevauvilliers tries out a tool:

I was going through my twitter feed and I came across this tweet where they spoke about the Data Wrangler Calling all #Python users! Have you tried Data Wrangler in #MicrosoftFabric?

I thought I would give this a try and that was the idea for my blog post. I honestly had no idea that firstly was this possible, but second that it is so easy for data wrangler to do all the hard work for me

I am going to demonstrate 2 transformations in this blog post, the first will be changing the d_date from date to datetime and then using the columns from examples I am going to create a new column where it concatenates 2 columns delimited with a double pipe command.

Read on for Gilbert’s thoughts.

Comments closed

Protecting Kubernetes Services

Boemo Mmopelwa gives us an idea of Kubernetes service types and how to secure them:

A Kubernetes service is a logical abstraction that enables communication between different components in Kubernetes. Services provide a consistent way to access and communicate with the application’s underlying components, regardless of where those components are located.

In Kubernetes the default type is ClusterIP. Services abstract a group of pods with the same functions. Services expose pods and clusters. Services are crucial for connecting the backend and front-end of your applications.

This is different from your containerized applications that you can deploy on Kubernetes

Comments closed

Monitoring Query Store State Changes

Jose Manuel Jurado Diaz wants to know when the Query Store state changes:

This morning, I have been working on a support case where our client was not able to see certain queries when querying the Query Data Store. We have observed that the cause is due to the Query Data Store changing to a read-only mode due to the volume of data and the limitation our client had on the QDS database space. Therefore, I would like to share the following PowerShell script that can be executed at regular intervals to check and retrieve when the state of QDS has changed. Unfortunately, in Azure SQL Database, we cannot use the Extended Event ‘qds.query_store_db_diagnostics’.

Click through to see Jose’s alternative solution.

Comments closed

Apache Doris and Data Colocation

Frank Z takes us through a use case for Apache Doris:

In data analytics, fast query performance is more of a result than a guarantee. What’s more important than the result itself is the architectural design and mechanism that enables quick performance. This is exactly what this post is about. I will put you into context with a typical use case of Apache Doris, an open-source MPP-based analytic database.

The user, in this case, is an all-category Q&A website. As a billion-dollar listed company, they have their own data management platform. What Doris does is support the data filtering, packaging, analyzing, and monitoring workloads of that platform. Based on their huge data size, the user demands quick data loading and quick response to queries. 

This sounds a lot like sharding of the data, where you segregate data for a particular customer/entity into its own database (and possibly instance), with the exception that queries are expected to go over a number of shards rather than focus on a single one.

Comments closed

Bring Fabric to the Data Lakehouse

Ust Oldfield ties together Databricks and Microsoft Fabric:

We’ve built countless Lakehouses for our customers and influenced the design of many more. With the advent of Fabric, many organisations with existing lakehouse implementations in Azure are wondering what changes Fabric will herald for them. Do they continue with their existing lakehouse implementation and design, or do they migrate entirely to Fabric?

For many, the answer will be to continue as-is. They’ve invested a lot of time and money in establishing a Lakehouse – to migrate now to a slightly different technology stack would be a very costly exercise! There also isn’t a need to migrate from a lakehouse implementation in Databricks to one in Fabric as there aren’t concrete benefits to be realised.

For those using Power BI as their semantic and reporting layers, as well as using Databricks SQL or Synapse Serverless as the serving layer, Fabric provides a perfect opportunity to rationalise the architecture and to bring about substantial performance gains through the Direct Lake connectivity and V-Order compression in Fabric.

Read on to see what Ust means, using a couple of architecture diagrams along the way.

Comments closed