Press "Enter" to skip to content

Author: Kevin Feasel

Databricks SQL Performance Tuning

Katie Cummiskey provides some tips for us:

We previously discussed how to use Power BI on top of Databricks Lakehouse efficiently. However, the well-designed and efficient Lakehouse itself is the basement for overall performance and good user experience.  We will discuss recommendations for physical layout of Delta tables, data modeling, as well as recommendations for Databricks SQL Warehouses.

These tips and techniques proved to be efficient based on our field experience. We hope you will find them relevant for your Lakehouse implementations too.

Read on for these tips.

Comments closed

Filling Gaps in T-SQL

Itzik Ben-Gan has a new challenge:

The task involves developing a stored procedure called GetBalances that accepts a parameter called @accountid representing an account ID. The stored procedure should return a result set with all existing dates and balances for the input account, but also gap-filled with the dates of the missing workdays between the existing minimum and maximum dates for the account, along with the last known balance up to that point. The result should be ordered by the date.

My first thought was last observation carried forward, which is now available in SQL Server 2022 (Itzik’s solution 2). I kind of thought of solution 3, though did not think through the mechanics of how it’d work and so I get no credit there.

Comments closed

First Look at Loading Data into Fabric

Reitse Eskens digs into Microsoft Fabric:

In my previous blog, I wrote about some first impression working with Fabric and mostly following the Lakehouse tutorial provided by Microsoft. Well structured as this one may be, the sizes are not like the sizes I’m seeing in the wild. So I decided to give Fabric a bit more of a challenge by letting it loose on my TPC-H dataset.

Click through for Reitse’s early analysis.

Comments closed

Plan Analysis of Graph Tables

Hugo Kornelis is back and looking at graphs:

SQL Graph is the name for a set of features, introduced in SQL Server 2017 and extended in SQL Server 2019, that bring graph database functionality into SQL Server. See here for the full documentation as provided by Microsoft. In this first post about SQL Graph, I’ll look at what execution plans reveal about the internal structure of graph tables. I’ll then use that knowledge in later parts, where I’ll discuss more advanced queries on graph tables.

Click through for the primer.

Comments closed

Bugs in SQL Server 2022 CU4

Brent Ozar facepalms:

<sigh> Okay, so, the last few Cumulative Updates have had known issues around broken remote queries using the generic ODBC connector and errors with contained availability groups, but I couldn’t really care less about those. If you use those features, I give you bombastic side eye anyway.

However, in the last few days, two more known issues have surfaced.

And those two issues are definitely bad. Click through to see what they are and what the workarounds are until fixes come in.

Comments closed

Data Pipelines and Dataflows in Fabric Data Factory

Reza Rad has two videos and posts for us. First up is a primer on data pipelines in Microsoft Fabric Data Factory:

The Pipeline comes from Azure Data Factory. A Pipeline is a group of activities bundled together into a workflow. For example, a Pipeline can generate a process around the Dataflow. For example, you may want to run a Dataflow in a loop until something happens, and with the failure or success of each execution, you want to perform a task such as sending out an email, copying data somewhere, running a stored procedure, etc.

Reza then gets into Dataflows:

Through the years, the Data Transformation engines evolved. In the past, much coding was involved, and the user interface was not the best experience. These days, most actions can be done through pre-built transformations; less coding is needed, and a hardcore developer is not needed for preliminary tasks. This enables citizen data engineers to work with these tools.

Power Query is the data transformation engine of the new generation of Microsoft Data Integration tools and services. Power Query is the data transformation engine used in Power BI. However, Power Query can be used as a standalone cloud-based data transformation service when it is used as Dataflow. Dataflow is the ETL in the cloud offered by Microsoft, which uses the Power Query engine.

Comments closed