Press "Enter" to skip to content

Curated SQL Posts

One Problem with Scrollbars in Power BI

Chris Webb focuses on the performance aspect of scrollbars on tables:

Concluding my series of blog posts on seemingly harmless things you can do in a Power BI report that can lead to performance problems and the “This visual has exceeded the available resources” error (see also “Calculate(), Filter() and DAX memory usage” and “DAX measures that never return blank“), in this post I’ll show how table visuals with vertical scrollbars that potentially show thousands of rows can be a Bad Thing.

I’d also note the aesthetic problem: the intent of a dashboard is to be glanceable, meaning that a user can gain sufficient understanding of what is happening without needing to click, drag, sort, filter, or otherwise manipulate the dashboard. This means, if you are intending to create a dashboard (versus a report), vertical scrollbars form a second sort of issue: you’re obscuring data that you consider important enough to show to the end user.

Comments closed

Statistics TIME and IO Analysis and Statistics Reporter

Rod Edwards does a bit of analysis:

Again, this is another one of those little bits of functionality that has been present since year dot. Unfortunately, it seems to be forgotten by many in the great game of query investigation.

Being able to see the IO patterns of a query that we’re running to help us see where the heavy lifting is occurring is really useful in allowing us to get the bottom of problems quickly, or see the benefits of our tuning efforts (or otherwise!).

One neat thing I did not know about before was a product called Statistics Reporter, an extension for SQL Server Management Studio that parses the results from time and I/O statistics results. I like Richie Rump’s Statistics Parser website, but the idea of having something built-in is pretty nice.

Comments closed

Variable Evaluation Time in DAX

Marco Russo and Alberto Ferrari evaluate some variables:

Variables are an important element of DAX to improve readability and performance. Variables are created through the VAR keyword, which can be repeated multiple times for multiple variables, followed by RETURN, the keyword that defines the result of the expression.

Read on for several examples of how you can use variables, and even if it’s possible for the engine to ignore variable declarations if the variable in question never sees use.

Comments closed

Tips to Identify Poorly-Performing Code

Rich Benner shares a few tricks:

So, you have a database or block of code. You’ve been told to “make it fast” but you’re not sure where to start. I’ve got you. We’re going to create a process to follow to ensure we can tune effectively and prove that we’ve made things faster. It’s all about having a structure when performance tuning.

Read on for Rich’s recommendations on how to identify poorly-performing code.

Comments closed

Keyset Pagination and Descending Order

Laurenz Albe digs into keyset pagination:

Keyset pagination is the most performant way to retrieve a large result set page by page. However, the neat trick with composite type comparison doesn’t always work. This article explains why and how you can work around that shortcoming.

All of the examples are for Postgres, though in my read-through of the post, I’d say it applies almost as well to SQL Server.

Comments closed

Translating Numbers to Words in T-SQL

Sebastiao Pereira needs no number:

Sometimes, it is necessary to have numbers spelled out in words, like when writing a sentence (“Two hundred sixty-one victims were hospitalized.”). How do you express numbers in words with T-SQL code?

I could see a bit more recursion in the solution making it tidier, but in fairness, a bit more recursion would make it a lot slower in T-SQL.

For moderate difficulty mode, try it in German. For hard mode, switch to French.

Comments closed

Transferring Linear Model Coefficients

Nina Zumel performs a swap:

A quick glance through the scikit-learn documentation on linear models, or the CRAN task view on Mixed, Multilevel, and Hierarchical Models in R reveals a number of different procedures for fitting models with linear structure. Each of these procedures meet different needs and constraints, and some of them can be computationally intensive to compute. But in the end, they all have the same underlying structure: outcome is modelled as a linear combination of input features.

But the existence of so many different algorithms, and their associated software, can obscure the fact that just because two models were fit differently, they don’t have to be run differently. The fitting implementation and the deployment implementation can be distinct. In this note, we’ll talk about transferring the coefficients of a linear model to a fresh model, without a full retraining.

I had a similar problem about 18 months ago, though much easier than the one Nina describes, as I did have access to the original data and simply needed to build a linear regression in Python that matched exactly the one they developed in R. Turns out that’s not as easy to do as you might think: the different languages have different default assumptions that make the results similar but not the same, and piecing all of this together took a bit of sleuthing.

Comments closed

Generating a Schedule in R

Tomaz Kastrun builds timetables:

Each meeting slot is represented as block (lasts arbitrary number of hours, mostly form 1 to 4). For conducting every block required are: pair of departmetnsroomtime-slot. It is also know in advance which groups attend which class and all rooms are the same size.

Input data all departments names, room names and time-slots.
Output data are rooms and timeslots for pair of departments in a time-schedule.

Click through for the code and explanation.

Comments closed

The Cost of Maintaining Extended Statistics in Postgres

Andrew Lepikhov breaks out the stopwatch:

In the previous post, I passionately advocated for integrating extended statistics and, moreover, creating them automatically. But what if it is too computationally demanding to keep statistics fresh?

This time, I will roll up my sleeves, get into the nitty-gritty and shed light on the burden extended statistics put on the digital shoulders of the database instance. Let’s set aside the cost of using this type of statistics during planning and focus on one aspect – how much time we will spend in an ANALYZE command execution.

Read the whole thing if you’re a Postgres admin or developer.

Comments closed

An Overview of Normal Forms

Daniel Calbimonte talks normalization:

Various levels of normalization in SQL can be used to reduce data redundancy and have a better-structured relational data model. This tutorial looks at these various levels with explanations and examples in Microsoft SQL Server for beginners.

I disagree with part of Daniel’s explanation of 1NF: I believe that the idea of atomicity, as Daniel defines it, is not part of 1NF. I’m basing this off of CJ Date’s definition of first normal form:

Given relvar R with heading H containing attributes A1…An of types T1…Tn, all tuples follow heading H and have one value of type Ti for attribute Ai.

All this says is that we have a single value per attribute in a tuple. “LeBron James, Lakers” and “Stephen Curry, Warriors” are perfectly reasonable values for attributes in first normal form. In Database Design and Relational Theory, Date spends a few pages covering the idea of atomicity and how there’s no good explanation for what, exactly, “atomic” means. Even in Daniel’s example, you could break down player and coach names further, not only into first and last names, but also subsets of characters within those names, like syllables. The closest thing I have for atomicity is the idea that something is atomic when it is at the lowest level given a particular set of data requirements. But that’s not a mathematical rule like the rules of normalization. It’s a business rule, and necessarily fuzzier and subjective.

That said, I like the rest of Daniel’s list and appreciate going to 5th normal form.

1 Comment