Press "Enter" to skip to content

Category: T-SQL

Tuning a Range Query

Grant Fritchey gets forum-sniped:

Recently I was looking through DBA.StackExchange when I saw a pretty simple question that I decided to answer. I went off, set up a test database and some test tables and quickly wrote a query to answer the question. I got it all formatted pretty and was on my way to post it when I saw that another answer was already there.

Yeah. Identical to mine. Almost line for line.

Well, nuts.

I know. I’ll write a blog post.

In thinking about the problem, the thing that caught my mind was Grant’s comment about poor design. This got me thinking about one of my favorite topics: orthogonal design for relational excellence. The idea of a BETWEEN table of [ MinValue : MaxValue ] is the first thing people think of but is also the worst because you have two big problems: gaps and overlap.

The second solution is to use MinValue and calculate MaxValue (if we actually need it) as LEAD(MinValue) OVER (ORDER BY MinValue) - e, where e represents the smallest reasonable increment we’d need. Queries would find, for each Value in the main table, the largest MinValue below Value. That removes gaps and overlap but might be a performance concern as the main table’s data size grows.

The big-brain solution, which generally works best when you have a discrete number of options, could be a tally table. In Grant’s example, we see values from 1 to 1000, with a rank for each. If it’s really as simple as that, we would create a new lookup table with Value + RankDesc and simply join the main table’s Value to the lookup table’s Value to get the appropriate RankDesc. Yeah, you have 1000 rows instead of 3 but queries are trivial at that point. The downside is that this approach doesn’t work for continuous variables (e.g., give me the exact amount of your household income for the prior tax year) and the utility of this solution probably breaks down once you get past tens of thousands of rows.

In the case of a continuous variable or an enormous discrete variable, we have the simplest option of all: ignore something. If you care about the range, use the table from the second solution and use that ID on the main table. If you care about the value but not the range, just have the value and no lookup table.

Comments closed

Strategies for De-Functionizing Your T-SQL

Tom Zika continues the 12-step program for scalar function abusers:

In the previous posts, we have learned why Scalar Functions (UDFs) are bad for parallelism and performance and what the options are for their removal.

The only remaining question is where to start. You’ve probably guessed there isn’t one true way to approach this. So instead, I offer several strategies that you can combine.

Plus Tom has an offer for you at the bottom of the post: conversion from Scalar UDFs to Inline Table-Valued Functions, which tend to perform much better.

Comments closed

Creating Repeatable Test Data

Louis Davidson repeats himself:

In order to test graph structures, I needed a large set of random data. In some ways, this data will resemble the IMDB database I will include later in this chapter, but to make it one, controllable in size and two, random, I created this random dataset. I loaded a set of values for an account into a table and a set of interests. I wanted then to be able to load a set of random data into edge, related one account to another (one follows the other), and then an account to a random set of interests.

In this article, I will discuss a few techniques I used, starting with the simplest method using ordering by NEWID(), then using RAND() to allow the code to generate a fixed set of data.

There’s a lot of code needed to do it but if it’s something you’ve got to do, that’s the cost of doing business.

Comments closed

Non-Updating Updates

Michael J. Swart has an update for us:

According to Microsoft’s documentation, an UPDATE statement “changes existing data in a table or view”. But what if the values don’t actually change? What if affected rows are “updated” with the original values? Some call these updates non-updating. This leads to a philosophical question: “If an UPDATE statement doesn’t change any column to a different value, has the row been updated?”

I answer yes to that question. I consider all affected rows as “updated” regardless of whether the values are different. I think of the UPDATE statement as more of an OVERWRITE statement. I also think of “affected rows” rather than “changed rows”. In most cases SQL Server thinks along the same lines.

I list some features and areas of SQL Server and whether non-updating updates are treated the same or differently than other updates:

Click through for multiple scenarios and to see whether SQL Server is smart enough to ignore non-updating records in an UPDATE statement.

Comments closed

MERGE in Dedicated SQL Pools

Emily Tehrani notes an addition to Azure Synapse Analytics:

We are thrilled to announce that the MERGE T-SQL command for Azure Synapse Dedicated SQL pools is now Generally Available! MERGE has been a highly requested addition to the Synapse T-SQL library that encapsulates INSERTs/UPDATEs/DELETEs into a single statement, drastically easing migrations and making table synchronization a breeze.

If you do decide to use this, I’d expect it to have the same bugs which make its use on-premises a mess. As always, MERGE responsibly.

Comments closed

Pagination in SQL Server

Chad Callihan is paging us:

Pagination (or paging) is the process used to return a set of results in pages. When it comes to pagination in SQL Server, you have a couple of different options. Today, we’ll look at three of them. The first being pagination with the use of FETCH/OFFSET, the second using a CTE, and the third with the use of ROW_NUMBER. Let’s look at a brief example of how each works.

For each example, I’m using @PageSize to determine how many results to return and @PageNumber to determine what page of results to return from the StackOverflow2013 database.

Chad has three techniques listed here. Pagination is one of those things which should be simple but all of the simple options have major performance drawbacks once you get to datasets whose sizes require pagination.

Comments closed

Modulus Computations on Large Numbers

Daniel Hutmacher does the math:

The modulus is the remainder of a division of two integers*. Suppose you divide 12 by 4, the result is 3. But divide 11 by 4, and the result is 2.75. This could also be expressed by saying that 11/4 is 2 with a remainder of 3. Computing that 3 is the work of the modulo operator, which in T-SQL is represented by the % operator.

Let’s explore how to compute the modulus of large numbers in SQL Server, and how this is useful in the real world.

Daniel’s example includes IBAN validation, though I think he’s secretly working on breaking asymmetric encryption…

Comments closed

Careful Batching

Michael J. Swart follows up on an older post:

When I wrote Take Care When Scripting Batches, I wanted to guard against a common pitfall when implementing a batching solution (n-squared performance). I suggested a way to be careful. But I knew that my solution was not going to be universally applicable to everyone else’s situation. So I wrote that post with a focus on how to evaluate candidate solutions.

But we developers love recipes for problem solving. I wish it was the case that for whatever kind of problem you got, you just stick the right formula in and problem solved. But unfortunately everyone’s situation is different and the majority of questions I get are of the form “What about my situation?” I’m afraid that without extra details, the best advice remains to do the work to set up the tests and find out for yourself.

Definitely read the original article first. My normal approach is the naive + index method, so I’ll have to try out Michael’s method as well next time I need to delete a chunk of records.

Comments closed

Deleting Rows in Order

David Fowler understand the order of things:

This was an interesting question that I was asked yesterday and something that I’d never really thought of before. Can you delete the top x number of rows based on an ORDER BY?

Why would you want to do that? Well let’s just assume that we have a ‘people’ table and some strange bug in the application has cause the need to delete the top 10 oldest females for whatever reason (I know, it’s very contrived example and not the situation that my colleague was facing but it’ll do to illustrate the point).

Click through for one way which doesn’t work and two ways which do.

Comments closed