Category: Syntax

Use TOP instead of SET ROWCOUNT

Published 2021-12-21 by Kevin Feasel

Jared Poche explains why the TOP clause is superior to using SET ROWCOUNT:

I was presenting on how to use the TOP clause to break down large operations into short, fast, bite-sized operations. The mechanics are things I learned from writing processes that do garbage collection, backfill new columns, and anonymizing PII data on existing tables. I’ve just posted the slides and example scripts here if you are interested.
ARE THEY THE SAME?
The question was whether the SET ROWCOUNT command would work just the same, and the answer is sometimes yes but largely no.

Read on to see what Jared means.

Comments closed

Using GREATEST and LEAST in Azure SQL DB

Published 2021-12-10 by Kevin Feasel

Aaron Bertrand preps us for SQL Server 2022:

In an earlier tip, “Find MAX value from multiple columns in a SQL Server table,” Sergey Gigoyan showed us how to simulate GREATEST() and LEAST() functions, which are available in multiple database platforms but were – at least at the time – missing from Transact-SQL. These functions are now available in Azure SQL Database and Azure SQL Managed Instance, and will be coming in SQL Server 2022, so I thought it was a good time to revisit Sergey’s methods and compare.

Read on to see how the workaround compares.

Comments closed

Limitations on Window Functions

Published 2021-12-02 by Kevin Feasel

I continue a series on window functions:

So far, we’ve seen some really cool things we can do with window functions. That said, there are some limitations with these windows. Some of them are built into how window functions work and others are incomplete implementations of the ANSI SQL standard. Let’s go check it out.

Click through to check it out.

Comments closed

CONVERT and Binary Styles

Published 2021-11-30 by Kevin Feasel

Abayomi Obawomiye has style:

I recently had a requirement to load some data from a source table to another destination table. The destination columns were exactly the same as the source columns with the same data types and length. The only difference was that some columns on the destination table must be encrypted. The task was to use the SHA2_512 encryption algorithm to encrypt the “sensitive” data. I will talk more about the encryption algorithm in another post.
To achieve this, I needed to use the HASHBYTES function in SQL Server. The challenge was that this function used with the SHA2_512 encryption algorithm will return a fixed character length of 64 characters which will be longer than the character length on my destination table. As a result, SQL Server will throw a truncation error. I will demonstrate this below.

One really important point: SHA is not encryption; it’s a hash (which is why the function is HASHBYTES() instead of something like EncryptByKey() as column-level security uses). Hashes are intended to be a one-way trip, whereas encryption implies an ability to decrypt if you have the relevant key details. Here, the use looks to be obfuscating the text of sensitive data fields, perhaps for loading in a dev/test environment, and so the actions themselves are quite reasonable.

By the way, the styles Abayomi talks about are all listed in this Docs page. Turns out that if you’re using a money datatype, you can use CONVERT() to display the end result with commas.

Comments closed

Powershell Functions and Return Type Oddities

Published 2021-11-30 by Kevin Feasel

Dave Mason goes down the rabbit hole:

As a result of some struggles trying to automate a process, I’ve learned some things about PowerShell. After getting to the bottom of a time-consuming problem, I thought it worth a blog post that might save someone else some time and heartache.
Let’s begin with this simple function named Get-RandomDate. It generates and returns a random date that is between today and X days ago. It has an input parameter $DaysAgo, which is of type [System.Int32]–it is a mandatory parameter.

That’s all straightforward, but then things get weird.

Comments closed

Ordered Set Functions in SQL Server

Published 2021-11-29 by Kevin Feasel

I continue a series on window functions in SQL Server:

As of SQL Server 2019, there is only one ordered set function: STRING_AGG(). I like STRING_AGG() a lot, especially because it means my days of needing to explain the STUFF() + FOR XML PATH trick to concatenate values together in SQL Server are numbered.
STRING_AGG() is interesting in that we categorize it as a window function and yet it violates my first rule of window functions: there isn’t an OVER() clause. Instead, it accepts but does not require a WITHIN GROUP() clause. Let’s see it in action.

Click through for a look at that, as well as a little hint that maybe we’ve seen ordered set functions before in a different guise.

Comments closed

FAST_FORWARD and Cursors

Published 2021-11-29 by Kevin Feasel

Joe Obbish skips past the commercials:

If you’re like me, you started your database journey by defining cursors with the default options. This went on until a senior developer or DBA kindly pointed out that you can get better performance by using the FAST_FORWARD option. Or maybe you were a real go-getter and found Aaron Bertrand’s performance benchmarking blog post on different cursor options. I admit that for many years I didn’t care to know why FAST_FORWARD sometimes made my queries faster. It had “FAST” in the name and that was good enough for me.
Recently I saw a production issue where using the right cursor options led to a 1000X performance improvement. I decided that ten years of ignorance was enough and finally did some research on different cursor options. This post contains a reproduction and discussion of the production issue.

I thought everybody knew how this works: the database streams the data tape from the supply pully to the play shaft by using a sprocket to rotate the gear in the cassette at a fixed speed. The FAST_FORWARD cursor option engages the fast forward idler in the ~~VCR~~ database and causes rotation to occur more rapidly than normal.

Comments closed

Statistical Window Functions in SQL Server

Published 2021-11-26 by Kevin Feasel

I continue a series on window functions in SQL Server:

CUME_DIST() doesn’t show 0 for the smallest record. The reason for this is in the definition: CUME_DIST() tells us how far along we are in describing the entire set—that is, what percentage of values have we covered so far. This percentage is always greater than 0. By contrast, PERCENT_RANK() forces the lowest value to be 0 and the highest value to be 1.
Another thing to note is ties. There are 117 values for customer 1 in my dataset. Rows 5 and 6 both have a percent rank of 0.0344, which is approximately rank 4 (remembering that we start from 0, not 1). Both rows 5 and 6 have the same rank of 4, and then we move up to a rank of 6. Meanwhile, for cumulative distribution, we see that rows 5 and 6 have a cumulative distribution of 6/117 = 0.5128. In other words, PERCENT_RANK() ties get the lowest possible value, whereas CUME_DIST() ties get the highest possible value.

Click through for much more detail, including examples galore.

Comments closed

Offset Window Functions

Published 2021-11-23 by Kevin Feasel

I continue a series on window functions:

Offset functions are another class of window function in SQL Server thanks to being part of the ANSI SQL standard. Think of a window of data, stretching over some number of rows. What we want to do is take the current row and then look back (or “lag” the data) or forward (“lead”).
There are four interesting offset window functions: LAG(), LEAD(), FIRST_VALUE(), and LAST_VALUE(). All four of these offset functions require an ORDER BY clause because of the nature of these functions. LAG() and LEAD() take two parameters, one of which is required. We need to know the value to lag/lead, so that’s a mandatory parameter. In addition, we have an optional parameter which indicates how many steps forward or backward we want to look. Let’s see this in action with a query:

Click through for that query, as well as a few more and plenty of explanation.

Comments closed

Ranking Window Functions

Published 2021-11-22 by Kevin Feasel

I continue a series on window functions in SQL Server:

The whole concept of ranking window functions is to assign some numeric ordering to a dataset. There are four ranking functions in SQL Server. Three of them are very similar to one another: ROW_NUMBER(), RANK(), DENSE_RANK(). The fourth one, NTILE(), is the odd cousin of the family.
Unlike aggregate window functions, all ranking window functions must have at least an ORDER BY clause in the OVER() operator. The reason is that you are attempting to bring order to the chaos of your data by assigning a number based on the order in which you join.

Watch me ramble on about monotonicity and quietly admit that I learned what it was from economics, where the naming feels utterly backward (“strongly monotonic” is the “greater than or equal to” of monotonicity, whereas “weakly monotonic” is the “greater than” of monotonicity). Also, I structured this entire post so that I could get that video from The Prisoner (the good one, not the garbage one) in it.

Comments closed