Press "Enter" to skip to content

Category: Syntax

Ordered Set Functions in SQL Server

I continue a series on window functions in SQL Server:

As of SQL Server 2019, there is only one ordered set function: STRING_AGG(). I like STRING_AGG() a lot, especially because it means my days of needing to explain the STUFF() + FOR XML PATH trick to concatenate values together in SQL Server are numbered.

STRING_AGG() is interesting in that we categorize it as a window function and yet it violates my first rule of window functions: there isn’t an OVER() clause. Instead, it accepts but does not require a WITHIN GROUP() clause. Let’s see it in action.

Click through for a look at that, as well as a little hint that maybe we’ve seen ordered set functions before in a different guise.

Comments closed

FAST_FORWARD and Cursors

Joe Obbish skips past the commercials:

If you’re like me, you started your database journey by defining cursors with the default options. This went on until a senior developer or DBA kindly pointed out that you can get better performance by using the FAST_FORWARD option. Or maybe you were a real go-getter and found Aaron Bertrand’s performance benchmarking blog post on different cursor options. I admit that for many years I didn’t care to know why FAST_FORWARD sometimes made my queries faster. It had “FAST” in the name and that was good enough for me.

Recently I saw a production issue where using the right cursor options led to a 1000X performance improvement. I decided that ten years of ignorance was enough and finally did some research on different cursor options. This post contains a reproduction and discussion of the production issue.

I thought everybody knew how this works: the database streams the data tape from the supply pully to the play shaft by using a sprocket to rotate the gear in the cassette at a fixed speed. The FAST_FORWARD cursor option engages the fast forward idler in the VCR database and causes rotation to occur more rapidly than normal.

Comments closed

Statistical Window Functions in SQL Server

I continue a series on window functions in SQL Server:

CUME_DIST() doesn’t show 0 for the smallest record. The reason for this is in the definition: CUME_DIST() tells us how far along we are in describing the entire set—that is, what percentage of values have we covered so far. This percentage is always greater than 0. By contrast, PERCENT_RANK() forces the lowest value to be 0 and the highest value to be 1.

Another thing to note is ties. There are 117 values for customer 1 in my dataset. Rows 5 and 6 both have a percent rank of 0.0344, which is approximately rank 4 (remembering that we start from 0, not 1). Both rows 5 and 6 have the same rank of 4, and then we move up to a rank of 6. Meanwhile, for cumulative distribution, we see that rows 5 and 6 have a cumulative distribution of 6/117 = 0.5128. In other words, PERCENT_RANK() ties get the lowest possible value, whereas CUME_DIST() ties get the highest possible value.

Click through for much more detail, including examples galore.

Comments closed

Offset Window Functions

I continue a series on window functions:

Offset functions are another class of window function in SQL Server thanks to being part of the ANSI SQL standard. Think of a window of data, stretching over some number of rows. What we want to do is take the current row and then look back (or “lag” the data) or forward (“lead”).

There are four interesting offset window functions: LAG()LEAD()FIRST_VALUE(), and LAST_VALUE(). All four of these offset functions require an ORDER BY clause because of the nature of these functions. LAG() and LEAD() take two parameters, one of which is required. We need to know the value to lag/lead, so that’s a mandatory parameter. In addition, we have an optional parameter which indicates how many steps forward or backward we want to look. Let’s see this in action with a query:

Click through for that query, as well as a few more and plenty of explanation.

Comments closed

Ranking Window Functions

I continue a series on window functions in SQL Server:

The whole concept of ranking window functions is to assign some numeric ordering to a dataset. There are four ranking functions in SQL Server. Three of them are very similar to one another: ROW_NUMBER()RANK()DENSE_RANK(). The fourth one, NTILE(), is the odd cousin of the family.

Unlike aggregate window functions, all ranking window functions must have at least an ORDER BY clause in the OVER() operator. The reason is that you are attempting to bring order to the chaos of your data by assigning a number based on the order in which you join.

Watch me ramble on about monotonicity and quietly admit that I learned what it was from economics, where the naming feels utterly backward (“strongly monotonic” is the “greater than or equal to” of monotonicity, whereas “weakly monotonic” is the “greater than” of monotonicity). Also, I structured this entire post so that I could get that video from The Prisoner (the good one, not the garbage one) in it.

Comments closed

Aggregate Window Functions

I have a series on window functions:

Here, we get the sum of LineProfit by CustomerID. Because SUM() is an aggregate function, we need a GROUP BY clause for all non-aggregated columns. This is an aggregate function. The full set of them in T-SQL is available here, but you’ll probably be most familiar with MIN()MAX()SUM()AVG(), and COUNT().

To turn this into a window function, we slap on an OVER() and boom! Note: “boom!” only works on SQL Server 2012 and later, so if you’re still on 2008 R2, it’s more of a fizzle than a boom.

Read on for several examples of this nature.

Comments closed

Thoughts on the New STRING_SPLIT

Ronen Ariely has mixed feelings on updates to the STRING_SPLIT function:

The main issue with this function, is that it returns a SET of rows with no specific order.

As you must know by now, a TABLE is a SET of rows (Rowstore table which is the more common in SQL Server) or columns (Columnstore table). The rows in the table are not stored in specific order (even if using clustered index, the rows can physically be stored in different locations on the disk, not necessarily maintained continuously one after the other. In addition, the server might read the rows in parallel and not necessarily in the order of the index. As a result, The order in which rows are returned in a result set are not guaranteed unless an ORDER BY clause is specified.

And this is the main issue with the STRING_SPLIT… until today

Read on to see how this update makes STRING_SPLIT() much better, and also how it could be even better still.

Comments closed

GROUP BY and Functional Dependencies

Lukas Eder illuminates us:

The SQL standard knows an interesting feature where you can project any functional dependencies of a primary (or unique) key that is listed in the GROUP BY clause without having to add that functional dependency to the GROUP BY clause explicitly.

I was unaware that this functionality existed (in some database platforms), and I’m not positive that I like it.

Comments closed