Press "Enter" to skip to content

Author: Kevin Feasel

Fixing Key Lookups

Erik Darling has some advice:

Key lookups represent a choice made by the optimizer between:

  • Clustered index scan
  • Nonclustered index seek/scan + Clustered index lookup

Lookups can be quite useful to avoid creating quite wide nonclustered index, but the optimizer is heavily biased against them. It does not like random I/O.

Read on for Erik’s thoughts on the matter and how you may be able to improve the performance of a query by removing a key lookup—or sometimes, how you may safely ignore a key lookup because hey, it’s a 1 millisecond difference on a query that runs once a day.

Comments closed

Blocking and Waiting in Power BI Import Mode Refreshes

Chris Webb has some ‘splainin to do:

Following on from my previous post showing how you can visualise the job graph for a Power BI Import mode semantic model refresh, I this post I will look at how you can interpret what the job graph tells you – specifically, explaining the concepts of blocking and waiting. As always, simple examples are the best way of doing this.

Click through for the explanation using a job graph.

Comments closed

An Overview of Clustering Techniques in R

Peter Laurinec gives us an overview:

Clustering is a very popular technique in data science because of its unsupervised characteristic – we don’t need true labels of groups in data. In this blog post, I will give you a “quick” survey of various clustering methods applied to synthetic but also real datasets.

Read on for a quick description of what clustering is and a few use cases. Then, Peter dives into a variety of techniques and important things you should know about them. H/T R-Bloggers.

Comments closed

Canceling a Power BI Dataflow Gen2 Refresh

Sandeep Pawar has a script for us:

At the time of writing this blog, it is not possible to cancel a Dataflow Gen2 (DFg2) refresh using the UI. This is a temporary limitation that I expect will be resolved soon. DFg2 can be resource intensive, and if the refresh takes longer than expected, it may consume a significant amount of CUs. Thankfully, you can use the Power BI Rest API to cancel it. My friend Alex Powers already has a PowerShell script that you can use. You can also use the Power BI VS Code extension by Gerhard Brueckl.

But I would like to show you how you can do this using the PowerBIRestClient in the latest version of Semantic-Link (v0.5.0).

Read on to see what this Python script does and how you can use it.

Comments closed

Benchmarking Cumulative Function Speed in TidyDensity

Steven Sanderson charts performance:

Statistical analysis often involves calculating various measures on large datasets. Speed and efficiency are crucial, especially when dealing with real-time analytics or massive data volumes. The TidyDensity package in R provides a set of fast cumulative functions for common statistical measures like mean, standard deviation, skewness, and kurtosis. But just how fast are these cumulative functions compared to doing the computations directly? In this post, I benchmark the cumulative functions against the base R implementations using the rbenchmark package.

Click through for the functions under test and how they fare.

Comments closed

Create and Connect to a Fabric Data Warehouse

Olivier Van Steenlandt builds a warehouse:

In this data recipe series, Microsoft Fabric – Data Warehouse will be explored. As a starting point, a blank Fabric workspace is used. You can sign up for a free Fabric trial by using the following URL: Data Analytics | Microsoft Fabric

In this data recipe, we will create a brand-new Data Warehouse in Fabric. Once created, we will connect to our Data Warehouse using Azure Data Studio.

Click through for the step-by-step process.

Comments closed

Heps, Clustered Indexes, and Non-Clustered Indexes

Erik Darling starts a new series:

Some of the best questions I get some clients, conference attendees, and random email, are about how to design indexes.

A lot of developers out there have a rather foggy picture of exactly how indexes work. They’re all seen phone books, and drawings of B-Tree indexes, but some common things still escape them.

In this post, I’m going to talk about a few things like I’m speaking to someone who has never created a table before.

The problem with the phone book analogy is that there’s an entire generation of people who haven’t used phone books.

Also, Erik has his own spin on the classic NUSE for cluster indexing.

2 Comments

The Triangular Distribution in TidyDensity

Steven Sanderson unleashes the power of the triangle:

Welcome back, fellow data enthusiasts! Today, we embark on an exciting journey into the world of statistical distributions with a special focus on the latest addition to the TidyDensity package – the triangular distribution. Tightly packed and versatile, this distribution brings a unique flavor to your data simulations and analyses. In this blog post, we’ll delve into the functions provided, understand their arguments, and explore the wonders of the triangular distribution.

Read on to learn what the triangular distribution is and how you can use work with it in TidyDensity.

Comments closed

Shortcuts in Microsoft Fabric

Koen Verbeeck takes a shortcut:

A while ago I had a little blog post series about cool stuff in Snowflake. I’m doing a similar series now, but this time for Microsoft Fabric. I’m not going to cover the basics of Fabric, hundreds of bloggers have already done that. I’m going to cover little bits & pieces that I find interesting, that are similar to Snowflake features or something that is an improvement over the “regular” SQL Server or related products.

In this blog post I’m going to talk about shortcuts

Read on to learn more about this feature.

Comments closed

Implicit Join Elimination in JooQ

Lukas Eder talks about implicit join elimination:

One of jOOQ’s key features so far has always been to render pretty much exactly the SQL that users expect, without any surprises – unless some emulation is required to make a query work, of course. This means that while join elimination is a powerful feature of many RDBMS, it isn’t part of jOOQ’s feature set, so far.

As Lukas mentions, many relational database products already do this–SQL Server is an example of one product that does. But not all of them do, so it’s nice to have that option available in the data access layer.

Comments closed