Press "Enter" to skip to content

Category: Syntax

Idempotence in SQL Scripts

Jared Westover lays out some solid advice:

Imagine you’ve spent weeks preparing a software release for production, only to deploy it late one night and receive an error that the table or column already exists. This occurs in production environments, even when you use migration-based deployment methods such as DbUp. How can you ensure or at least reduce the likelihood of an error like this in the future?

At a prior job, we needed to write idempotent scripts because the deploy-to-QA process would run every script for the sprint every time someone checked in a new script. This prevented a few classes of release error, and I’ve carried that practice with me to future engagements.

SQL Server 2016 then gave us several helpers like CREATE OR ALTER for stored procedures and views, or DROP IF EXISTS for views and tables. It’s not a complete set of everything I’d like the functionality to do, but it’s a lot more convenient than what we had to do in prior versions.

Leave a Comment

Tracking the Last Sequence Value in SQL Server

Greg Low shares some queries and some history:

Sequences allow us to create a schema-bound object that is not associated with any specific table.

For example, if I have a Sales.HotelBookings table, a Sales.FlightBookings table, and a Sales.VehicleBookings table, I might want to have a common BookingID used as the key for each table. If more than the BookingID was involved, you could argue that there is a normalization problem with the tables, but we’ll leave that discussion for another day.

Another reason I like sequences is that they make it much easier to override the auto-generated value, without the need for code like SET IDENTITY_INSERT that we need with IDENTITY columns. This is particularly powerful if you ever need to do this across linked servers, as you’ll quickly find out that it doesn’t work.

Sequences let me avoid these types of issues: they perform identically to IDENTITY columns, and they also give me more control over the cache for available values.

Click through for some queries to find the latest value of a sequence, as well as how this functionality has changed over the years. One thing that I would point out is that, on busy systems, you might find that the value has changed between the time you run this query and the time you use the results.

Leave a Comment

Rolling Average Calculation via DATE_BUCKET()

Koen Verbeeck writes some code for SQL Server 2022 or later:

In the Microsoft Fabric Warehouse, a new T-SQL function was recently added: the DATE_BUCKET function. With this function, you can group dates into pre-defined buckets. This allows you to easily calculate aggregates that use the GROUP BY clause over these buckets, greatly simplifying the T-SQL statements for analytical use cases.

Click through for a demo. Koen mentions that this is also now available in the Microsoft Fabric Warehouse. Once you know how DATE_BUCKET() works, it’s pretty powerful. But I also think that the function is a bit confusing to use.

Leave a Comment

NTILE and Uneven Row Distribution

Jared Westover clarifies:

One of the simplest yet least-popular ranking functions in T-SQL is NTILE. It’s useful for dividing data into buckets or tiles. However, when your data isn’t evenly distributed across buckets, the results are confusing. Also, NTILE sometimes returns rows in a seemingly random order. What’s happening here?

There’s absolutely a pattern to how NTILE() works, as Jared describes.

Leave a Comment

The Real Ultimate Power of Omni-JOIN

Erik Darling coins a term:

You’ve got semi joins, and you’ve got anti-semi joins.

Everyone (looselyvery loosely, everyone) knows what those do: Find a match, or confirm there isn’t one. Lemon-squeezey.

The big question is, what do you call the style of join that requires a full enumeration on both sides? Erik tries out a series of ideas before landing on Omni-joins. I like the term.

1 Comment

Combining UNION and UNION ALL

Greg Low crosses the streams:

Until the other day though, I’d never stopped to think about what happens when you mix the two operations. I certainly wouldn’t write code like that myself but for example, without running the code (or reading further ahead yet), what would you expect the output of the following command to be? (Note: The real code read rows from a table but I’ve mocked it up with a VALUES clause to make it easier to see the outcome).

Read on to see what happens.

Comments closed

Creating Data from Literals in SQL Server

Louis Davidson has values. Many, many values:

Row Creators were introduced in SQL Server 2008, and allow you to create multiple rows in a single INSERT statement by using the VALUES clause. In this blog, I will demonstrate a few ways that we have created data in tables, and then show how you can do this with row constructors.

It’s not the only neat trick with VALUES(), either: you can also use CROSS APPLY and VALUES() to perform an efficient unpivot, turning a long virtual table into a wide virtual table.

Comments closed

Running Totals over Arbitrary Date Ranges

Louis Davidson solves an interval problem:

Say you want to find the most recent 30-day period during which a person purchased some amount of products from your company. How you market to a customer might change if they have been active over a time period recently, or even in the past. But this also means that for each day going back in history, you need to sum historic data over and over, and the previous 29 days of activity. This is generally known as a rolling total. Doing this sort of calculation has been an interesting problem for many years.

When window functions came around, they became quite useful for such tasks, but they have one kind of complicated problem: gaps in source data patterns.

Funnily enough, there is a solution using window functions: range intervals. The ANSI SQL definition for RANGE (versus ROWS) for window functions does allow for the specification of a date range, like RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW. Very impressive.

Unfortunately, SQL Server doesn’t support these. PostgreSQL does, but it’s an area I’ve agitated about for a few years and I do hope that someday, the SQL Server product team will support this functionality. In the meantime, Louis has a solution that works well for the task.

Comments closed

SELECT * in EXISTS Redux

Louis Davidson follows up from a prior post:

For example, it is often said that SELECT * makes your queries slower. In a nuanced way, this is often true, but only if changes occur in the database where columns are added. So many readers (myself included) see something that is demonstrably not 100% being treated as such, and they tune out.

There are plenty of other reasons you shouldn’t use that construct, no matter what.

In this post, I want to admit to having my mind changed, and I will go back and change the previous post.

One thing I really appreciate about Louis is his willingness to listen to new information, update his priors, and outright say “Hey, here’s what I thought before and now I believe this instead.” That’s a commendable trait.

Comments closed

Diving into DISTINCT

Louis Davidson is one of a kind:

If there is one SQL keyword that causes more fear than any other, it’s DISTINCT. When I see it in a query, I immediately start to worry about just how much work I am in for to ensure the correctness of that query. I start scanning for comments to describe why it is there, and if none are found, I know the query is probably going to be wrong.

I have seen DISTINCT used to hide bad joins, missing grouping, and even missing WHERE clauses. I have seen developers use it as a “fix-all” for data problems.

In this blog, I will look at the proper use and distinctly dangerous uses of DISTINCT and also show how you might test your query that uses DISTINCT to see what it is actually covering up.

Louis also includes one of my “favorite” coding errors: the accidental self-join. Done that one too many times to be proud of.

Comments closed