Category: T-SQL

Statistical Window Functions in SQL Server

Published 2021-11-26 by Kevin Feasel

I continue a series on window functions in SQL Server:

CUME_DIST() doesn’t show 0 for the smallest record. The reason for this is in the definition: CUME_DIST() tells us how far along we are in describing the entire set—that is, what percentage of values have we covered so far. This percentage is always greater than 0. By contrast, PERCENT_RANK() forces the lowest value to be 0 and the highest value to be 1.
Another thing to note is ties. There are 117 values for customer 1 in my dataset. Rows 5 and 6 both have a percent rank of 0.0344, which is approximately rank 4 (remembering that we start from 0, not 1). Both rows 5 and 6 have the same rank of 4, and then we move up to a rank of 6. Meanwhile, for cumulative distribution, we see that rows 5 and 6 have a cumulative distribution of 6/117 = 0.5128. In other words, PERCENT_RANK() ties get the lowest possible value, whereas CUME_DIST() ties get the highest possible value.

Click through for much more detail, including examples galore.

Comments closed

Saying No to NOLOCK

Published 2021-11-24 by Kevin Feasel

Brent Ozar just says no:

When you put NOLOCK in your query, SQL Server will:
– Read rows twice
– Skip rows altogether
– Show you changes that never actually got committed
– Let your query fail with an error
This is not a bug. This is by design.

There are reasons why you might want to use NOLOCK, but start with no and you’ll be in better shape.

Also, remember that NOLOCK really means “No, lock!”

Comments closed

Offset Window Functions

Published 2021-11-23 by Kevin Feasel

I continue a series on window functions:

Offset functions are another class of window function in SQL Server thanks to being part of the ANSI SQL standard. Think of a window of data, stretching over some number of rows. What we want to do is take the current row and then look back (or “lag” the data) or forward (“lead”).
There are four interesting offset window functions: LAG(), LEAD(), FIRST_VALUE(), and LAST_VALUE(). All four of these offset functions require an ORDER BY clause because of the nature of these functions. LAG() and LEAD() take two parameters, one of which is required. We need to know the value to lag/lead, so that’s a mandatory parameter. In addition, we have an optional parameter which indicates how many steps forward or backward we want to look. Let’s see this in action with a query:

Click through for that query, as well as a few more and plenty of explanation.

Comments closed

Ranking Window Functions

Published 2021-11-22 by Kevin Feasel

I continue a series on window functions in SQL Server:

The whole concept of ranking window functions is to assign some numeric ordering to a dataset. There are four ranking functions in SQL Server. Three of them are very similar to one another: ROW_NUMBER(), RANK(), DENSE_RANK(). The fourth one, NTILE(), is the odd cousin of the family.
Unlike aggregate window functions, all ranking window functions must have at least an ORDER BY clause in the OVER() operator. The reason is that you are attempting to bring order to the chaos of your data by assigning a number based on the order in which you join.

Watch me ramble on about monotonicity and quietly admit that I learned what it was from economics, where the naming feels utterly backward (“strongly monotonic” is the “greater than or equal to” of monotonicity, whereas “weakly monotonic” is the “greater than” of monotonicity). Also, I structured this entire post so that I could get that video from The Prisoner (the good one, not the garbage one) in it.

Comments closed

Aggregate Window Functions

Published 2021-11-18 by Kevin Feasel

I have a series on window functions:

Here, we get the sum of LineProfit by CustomerID. Because SUM() is an aggregate function, we need a GROUP BY clause for all non-aggregated columns. This is an aggregate function. The full set of them in T-SQL is available here, but you’ll probably be most familiar with MIN(), MAX(), SUM(), AVG(), and COUNT().
To turn this into a window function, we slap on an OVER() and boom! Note: “boom!” only works on SQL Server 2012 and later, so if you’re still on 2008 R2, it’s more of a fizzle than a boom.

Read on for several examples of this nature.

Comments closed

Pokey Performance with EXISTS

Published 2021-11-15 by Kevin Feasel

Erik Darling reminds us that even good things can go bad:

Look, I really like EXISTS and NOT EXISTS. I do. They solve a lot of problems.
This post isn’t a criticism of them at all, nor do I want you to stop using them. I would encourage you to use them more, probably.
But there’s some stuff you need to be aware of when you use them, whether it’s in control-flow logic, or in queries.

But do read on to see a specific type of issue you can run into with a left semi join.

Comments closed

Optimization Tips with Inline Table-Valued Functions

Published 2021-11-12 by Kevin Feasel

Itzik Ben-Gan continues a series on table expressions:

This is the thirteenth and last installment in a series about table expressions. This month I continue the discussion I started last month about inline table-valued functions (iTVFs).
Last month I explained that when SQL Server inlines iTVFs that are queried with constants as inputs, it applies parameter embedding optimization by default. Parameter embedding means that SQL Server replaces parameter references in the query with the literal constant values from the current execution, and then the code with the constants gets optimized. This process enables simplifications that can result in more optimal query plans. This month I elaborate on the topic, covering specific cases for such simplifications such as constant folding and dynamic filtering and ordering. If you need a refresher on parameter embedding optimization, go over last month’s article as well as Paul White’s excellent article Parameter Sniffing, Embedding, and the RECOMPILE Options.

This was a really good series. If you haven’t seen the entries, set aside some time and check it out.

Comments closed

Version 12 of sp_WhoIsActive

Published 2021-11-12 by Kevin Feasel

Erik Darling answers the long-standing question “Who is active?” with “You is active!”:

– New parameter, @get_memory_info, that exposes memory grant information, both in two top-level scalar columns and a new XML-based memory_info column.
– Better handling of the newer CX* parallelism wait types that have been added post-2016
– A top-level implicit_transaction identifier, available in @get_transaction_info = 1 mode
– Added context_info and original_login_name to additional_info collection
– A number of small bug fixes
– Transition code to use spaces rather than tabs

Spaces rather than tabs? SQL should have tabs! But functional programming languages are great and they use spaces! I’m so conflicted!

Comments closed

TRY_CAST and TRY_CONVERT Errors

Published 2021-11-11 by Kevin Feasel

Erik Darling breaks things:

I was a bit surprised by this, because I thought the whole point of these new functions was to avoid errors like this.

This was a surprising result, for sure. Apparently the function returns an error when a cast is explicitly not permitted. But like Erik writes, that seems to go against the spirit of the function.

Comments closed

Data Conversion with Failure

Published 2021-11-10 by Kevin Feasel

Kevin Wilkie deals with not-so-datey dates:

Because of the “not-so-dateness” of the data, you cannot convert non-date items into a date, at least not with the CONVERT function.

“Not-so-datey” is, of course, a technical term. Read on to see what you can use as an alternative to CONVERT().

Comments closed