Press "Enter" to skip to content

Category: T-SQL

Another Number Series Generator Solution

Itzik Ben-Gan reviews Paul White’s solution to the number series generator challenge:

I love Paul White’s work. I keep being shocked by his discoveries, and wonder how the heck he figures out what he does. I also love his efficient and eloquent writing style. Often while reading his articles or posts, I shake my head and tell my wife, Lilach, that when I grow up, I want to be just like Paul.

When I originally posted the challenge, I was secretly hoping that Paul would post a solution. I knew that if he did, it would be a very special one. Well, he did, and it’s fascinating! It has excellent performance, and there’s quite a lot that you can learn from it. This article is dedicated to Paul’s solution.

This is an elite pair-up and well worth your time to review in detail.

Comments closed

Trickiness Around CASE Expressions

Edwin Sanchez takes us through a few issues you might run into with CASE expressions:

If you are trying to scan the subheadings to see what the issues are, I can’t blame you. Readers, including me, are impatient.

I trust that you already know the basics of SQL CASE, so, I won’t bore you with long introductions. Let’s dig into a deeper understanding of what’s happening under the hood.

Click through for the list.

Comments closed

Building a Better Random

David Fowler wants to generate a random number:

You see the problem? Well in the post that I mentioned, I adressed this issue and offered a solution using CHECKSUM() and NEWID().

That’s ok but can be a bit cumbersome. I recently stumbled across a function that I hadn’t realised was there before (and of course this is where you all tell me, we’ve known about that all the time), CRYPT_GEN_RANDOM().

My expectation with this was that CRYPT_GEN_RANDOM() would generate according to a uniform distribution. Performing a simple experiment with about 4 million entries against random numbers 0-255 (1 byte), I can confirm that this does approximate a uniform distribution.

Comments closed

Avoid the MERGE Operator in T-SQL

Cyndi Johnson has a warning for us:

The bugs alone are reasons enough to avoid using it, and also to refactor any existing MERGE statements you have in your code base. Those bugs include ones that have performance implications as well as ones that could lead to incorrect results. Long story short, Microsoft refuses to fix several of them, and some of the other major issues are “by design”!

If you still feel the desire to use them, please keep reading, and I hope by the end of this blog you will be convinced that

1. Less statements does not always lead to better code or better performance.
2. MERGE is like a VAMPIRE and once it is in your codebase, it’s just a matter of time before it drains your servers of their lifeblood.

The worst part about the second point is all the sparkling.

The worst part about the first point is that I was really excited with MERGE came out in 2008, as I wanted UPSERT for SQL Server. It, uh, didn’t work out so well.

Comments closed

COUNT(*) vs COUNT(1)

David Alcock tries an experiment:

The question whether to use COUNT(*) or COUNT(1) has been doing the rounds on the internet again, the question this time is whether or not COUNT(*) forces the optimiser to scan all rows or not.

I’m going to query Sales.Orders table of the WideWorldImporters sample database and use a where clause that should take advantage of a non-clustered index on CustomerID. 

David looks at COUNT(*) versus COUNT(1). The story’s slightly different with COUNT(ColumnName) when there are NULL values afoot.

Comments closed

More Number Series Generator Solutions

Itzik Ben-Gan continues a series on generating rows quickly:

This is the third part in a series about solutions to the number series generator challenge. In Part 1 I covered solutions that generate the rows on the fly. In Part 2 I covered solutions that query a physical base table that you prepopulate with rows. This month I’m going to focus on a fascinating technique that can be used to handle our challenge, but that also has interesting applications well beyond it. I’m not aware of an official name for the technique, but it is somewhat similar in concept to horizontal partition elimination, so I’ll refer to it informally as the horizontal unit elimination technique. The technique can have interesting positive performance benefits, but there are also caveats that you need to be aware of, where under certain conditions it can incur a performance penalty.

This is a great post on a rather complex topic.

Comments closed

Avoiding Division by Zero

Chad Callihan has a few methods for us to avoid dividing by zero:

In the real world, everyone knows that if you divide by zero a wormhole will open up and swallow the universe. In SQL Server, it’s not good, but it’s not nearly as dramatic. I encountered the following error this week and thought it would make a good topic:

Msg 8134, Level 16, State 1, Line 6
Divide by zero error encountered.

There are multiple ways to handle this error message in SQL Server and some are better than others. Let’s take a look at a few.

Click through for those methods and try to keep this universe existent—it’s where I keep all my stuff.

Comments closed

From SQL to Functional

Shel Burkow has the start of an interesting series:

Neither loops nor branches were used in any of the T-SQL work, but that doesn’t mean they aren’t there. They are – visible underneath in the query plans, which describe the process steps (operators and data flow) the SQL Server database engine follows to produce results. A salient example of looping is the Nested Loops operator used in the T-SQL rewrite query plan to match each outer row to an inner row, i.e. perform the INTERSECT. This style of coding, in which the logic of the computation is expressed rather than the lower-level control flow detail, is declarative programming, and is a major theme of this series. (The latter being imperative programming.)

This is a concept I’m really big on: I think functional programming languages are easier for data platform developers to pick up than object-oriented or imperative languages for the reason that both are declarative, so many of the programming metaphors for one apply to the other.

Comments closed

Fun with Multi-Column Unique Constraints

Aaron Bertrand has an interesting use case:

A problem that comes up occasionally with constraints in SQL Server is when a unique constraint applies to multiple columns, but the values in those columns can be populated in any order. For example, if a table holds interactions between two users, and has columns User1 and User2, there should only be one row allowed for users 100 and 200, regardless of whether the data is entered as 100, 200 or 200, 100.

Click through for one solution. Another solution would be to normalize this down further with a dbo.ConversationParticipants table.

Comments closed

Disorderly Queries

Chad Callihan wants you to think about that ORDER BY clause:

I recently came across a scenario where an application process was not performing correctly on one database but was working fine on others. The process should have been completing in seconds but was taking minutes with no indication of activity. After some investigation, I found that the process was stuck waiting on a SELECT statement to complete. Even worse, it was holding an exclusive lock on a table which was then blocking new information from processing.

One part of the SELECT query that stood out was that it was ordering by a date field. Considering what the process was doing, there was no need to have the data ordered. Generally, it’s better to have the data sorted on the application side instead of SQL Server but in this case not even that was necessary.

There are definitely good cases where you need to use ORDER BY in a database—such as paging scenarios. But if you don’t need things in a particular order, Chad shows that you can potentially save a good deal on performance without an explicit ordering.

Comments closed