Curated SQL – Page 297 – A Fine Slice Of SQL Server

Unit Testing a Database

Published 2024-07-17 by Kevin Feasel

Olivier Van Steenlandt builds some tests:

In the past few years, I learned much about collaborative data warehouse development and deployment automatization by using Database Projects (SSDT) and Azure DevOps (and other tools).

I had my fair share of learning curves, making mistakes, and having great learning opportunities. Lately, I started my next journey to learn about Unit Testing for data warehousing/database development.

In this data cookbook (blog post series), we will discover the wonderful world and different flavors of unit testing from a data perspective. In the coming weeks/months, new data recipes (blog posts) will be released bi-weekly.

This first post provides an overview of the topic and includes links to three tools, though SQL Test is an implementation of tSQLt. Of the three, Visual Studio tests are the best of the bunch, though they’re more integration tests than unit tests.

Comments closed

Script Tasks in Azure DevOps Pipelines

Published 2024-07-17 by Kevin Feasel

Justin Bird builds a script task:

There are several dedicated script tasks that can be used for different purposes, some of the tasks have shortcut tasks which require much less input and are arguably easier to read. In the examples below, I have included the required inputs needed to run the scripts but in most cases, other optional inputs are available.

This looks pretty similar to the way you’d do it in GitHub Actions workflows as well—no surprise there, considering that the same group of people who built pipelines in ADO handled Actions in GitHub.

Comments closed

One Problem with Scrollbars in Power BI

Published 2024-07-16 by Kevin Feasel

Chris Webb focuses on the performance aspect of scrollbars on tables:

Concluding my series of blog posts on seemingly harmless things you can do in a Power BI report that can lead to performance problems and the “This visual has exceeded the available resources” error (see also “Calculate(), Filter() and DAX memory usage” and “DAX measures that never return blank“), in this post I’ll show how table visuals with vertical scrollbars that potentially show thousands of rows can be a Bad Thing.

I’d also note the aesthetic problem: the intent of a dashboard is to be glanceable, meaning that a user can gain sufficient understanding of what is happening without needing to click, drag, sort, filter, or otherwise manipulate the dashboard. This means, if you are intending to create a dashboard (versus a report), vertical scrollbars form a second sort of issue: you’re obscuring data that you consider important enough to show to the end user.

Comments closed

Statistics TIME and IO Analysis and Statistics Reporter

Published 2024-07-16 by Kevin Feasel

Rod Edwards does a bit of analysis:

Again, this is another one of those little bits of functionality that has been present since year dot. Unfortunately, it seems to be forgotten by many in the great game of query investigation.

Being able to see the IO patterns of a query that we’re running to help us see where the heavy lifting is occurring is really useful in allowing us to get the bottom of problems quickly, or see the benefits of our tuning efforts (or otherwise!).

One neat thing I did not know about before was a product called Statistics Reporter, an extension for SQL Server Management Studio that parses the results from time and I/O statistics results. I like Richie Rump’s Statistics Parser website, but the idea of having something built-in is pretty nice.

Comments closed

Variable Evaluation Time in DAX

Published 2024-07-16 by Kevin Feasel

Marco Russo and Alberto Ferrari evaluate some variables:

Variables are an important element of DAX to improve readability and performance. Variables are created through the VAR keyword, which can be repeated multiple times for multiple variables, followed by RETURN, the keyword that defines the result of the expression.

Read on for several examples of how you can use variables, and even if it’s possible for the engine to ignore variable declarations if the variable in question never sees use.

Comments closed

Tips to Identify Poorly-Performing Code

Published 2024-07-16 by Kevin Feasel

Rich Benner shares a few tricks:

So, you have a database or block of code. You’ve been told to “make it fast” but you’re not sure where to start. I’ve got you. We’re going to create a process to follow to ensure we can tune effectively and prove that we’ve made things faster. It’s all about having a structure when performance tuning.

Read on for Rich’s recommendations on how to identify poorly-performing code.

Comments closed

Keyset Pagination and Descending Order

Published 2024-07-16 by Kevin Feasel

Laurenz Albe digs into keyset pagination:

Keyset pagination is the most performant way to retrieve a large result set page by page. However, the neat trick with composite type comparison doesn’t always work. This article explains why and how you can work around that shortcoming.

All of the examples are for Postgres, though in my read-through of the post, I’d say it applies almost as well to SQL Server.

Comments closed

Translating Numbers to Words in T-SQL

Published 2024-07-16 by Kevin Feasel

Sebastiao Pereira needs no number:

Sometimes, it is necessary to have numbers spelled out in words, like when writing a sentence (“Two hundred sixty-one victims were hospitalized.”). How do you express numbers in words with T-SQL code?

I could see a bit more recursion in the solution making it tidier, but in fairness, a bit more recursion would make it a lot slower in T-SQL.

For moderate difficulty mode, try it in German. For hard mode, switch to French.

Comments closed

Transferring Linear Model Coefficients

Published 2024-07-15 by Kevin Feasel

Nina Zumel performs a swap:

A quick glance through the scikit-learn documentation on linear models, or the CRAN task view on Mixed, Multilevel, and Hierarchical Models in R reveals a number of different procedures for fitting models with linear structure. Each of these procedures meet different needs and constraints, and some of them can be computationally intensive to compute. But in the end, they all have the same underlying structure: outcome is modelled as a linear combination of input features.

But the existence of so many different algorithms, and their associated software, can obscure the fact that just because two models were fit differently, they don’t have to be run differently. The fitting implementation and the deployment implementation can be distinct. In this note, we’ll talk about transferring the coefficients of a linear model to a fresh model, without a full retraining.

I had a similar problem about 18 months ago, though much easier than the one Nina describes, as I did have access to the original data and simply needed to build a linear regression in Python that matched exactly the one they developed in R. Turns out that’s not as easy to do as you might think: the different languages have different default assumptions that make the results similar but not the same, and piecing all of this together took a bit of sleuthing.

Comments closed

Generating a Schedule in R

Published 2024-07-15 by Kevin Feasel

Tomaz Kastrun builds timetables:

Each meeting slot is represented as block (lasts arbitrary number of hours, mostly form 1 to 4). For conducting every block required are: pair of departmetns, room, time-slot. It is also know in advance which groups attend which class and all rooms are the same size.

Input data all departments names, room names and time-slots.
Output data are rooms and timeslots for pair of departments in a time-schedule.

Click through for the code and explanation.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Curated SQL Posts