Press "Enter" to skip to content

Category: T-SQL

DATE_BUCKET() Now GA in Fabric Data Warehouse

Jovan Popovic makes an announcement:

We have introduced a new DATE_BUCKET() function in Fabric Data Warehouse SQL language that makes reporting and analytics even easier.

In this blog post, you’ll discover how it simplifies time-based reporting and makes grouping dates effortless.

My experience is that DATE_BUCKET() takes a bit of effort getting used to, as it is not an intuitive function. That said, it can be really powerful for dealing with time series data. It is also available in SQL Server, as of SQL Server 2022.

Comments closed

Gaps in Identity Columns

Brent Ozar explains why there can be gaps in identity columns:

And you use that identity number for invoice numbers, or customer numbers, or whatever, and you expect that every single number will be taken up. For example, your accounting team might say, “We see order ID 1, 3, and 4 – what happened to order ID #2? Did someone take cash money from a customer, print up an invoice for them, and then delete the invoice and pocket the cash?”

Well, that might have happened. But there are all kinds of reasons why we’d have a gap in identities. One of the most common is failed or rolled-back transactions. To illustrate it, let’s start a transaction in one window:

I have a talk on applying forensic accounting techniques using SQL and Python (as well as an older version using R) and this is one of the things I bring up. In cases where you absolutely need contiguous numbers, the best I can do for you is no identity column and a stored procedure that runs in a SERIALIZED transaction isolation level, using an app lock to prevent anybody else from calling the stored procedure concurrently, taking a table lock out on the relevant table prior to doing any real work, and hard blocking everybody else until your transaction either succeeds or fails. And I’m not even 100% sure on that if you have enough concurrency to matter.

Comments closed

Refactoring SQL Code

Steve Jones shares some thoughts:

I was thinking about this when I saw this article on strategies to refactor sql code. The article seems written more for PostgreSQL, but there are items that relate to T-SQL as well. The main thrust of the article is about trying to rewrite code to DRY (don’t repeat yourself). The more changes you can make to shrink code, either to make it easier to read or avoid repeating those copy/paste items, the better off your team will be. It’s easy to think those copies aren’t a big deal, but it’s easy to update code in one place because that solves the problem you were given, and forget to fix all the copies.

Strict refactoring—leaving the inputs and outputs alone and only modifying the structure of code beyond the scope of reformatting but without changing its behavior—is somewhat uncommon in T-SQL outside of performance tuning scenarios, at least in my experience. The problem I have with DRY, when it comes to T-SQL, is that you generally need to pay the performance piper. Yes, repeating the contents of a common function in a series of T-SQL queries is repetition and “wasteful” in that regard, but if it makes the queries run literally 3-9x faster just from making these changes, I don’t care. I’ll do it.

If T-SQL were an idealized implementation of a fourth-generation language, where all viable equivalent queries would have the same execution plan and thus the same performance, then we’d see a lot more code refactoring because the way we write the code would not have a direct impact on how it runs. But in practice, that’s not the case.

Comments closed

RegEx Performance in SQL Server 2025

Brent Ozar has an update:

Back in March 2025 when Microsoft first announced that REGEX support was coming to SQL Server 2025 and Azure SQL DB, I gave it a quick test, and the performance was horrific. It was bad in 3 different ways:

  1. The CPU usage was terrible, burning 60 seconds of CPU time to check a few million rows
  2. It refused to use an index
  3. The cardinality estimation was terrible, hard-coded to 30% of the table

Read on to see what has changed. It’s obviously not perfect, but just as obviously is much better than what Brent saw in Azure SQL DB at the time.

Comments closed

Regular Expression Functions in SQL Server 2025

Tomaz Kastrun continues an advent of SQL Server 2025. Day 8 looks at a pair of regular expression-related functions:

Continuing with SQL Server 2025 T-SQL functions for Regular Expressions for in string and count functionalities.

And Day 9 hits two more:

Last two functions in the family of new T-SQL functions that were shipped with RegEx, are REGEXP_MATCHES() and REGEXP_SPLIT_TO_TABLE().

Read on to see how all four of these work.

Comments closed

Generating Shape-Bound Random Points in SQL Server

Sebastiao Pereira generates some numbers:

Random number generation is vital in computer science, supporting fields like optimization, simulation, robotics, and gaming. The quality, speed, and sometimes security of the generator can directly affect an algorithm’s correctness, performance, and competitiveness. In Python, random number generation is well-supported and widely used. In this article, we will look how to we can use SQL to do this.

Click through for several examples.

Comments closed

New T-SQL Functions

Tomaz Kastrun has been busy with this year’s advent of SQL Server 2025 blog posting. Catching up, Tomaz first looks at base-64 encoding and decoding:

SQL Server 2025 introduces a new T-SQL functions for BASE64 varbinary expressions. The first function returns base64-encoded text (BASE64_ENCODE() ), respectively for BASE64_DECODE().

BASE64_ENCODE converts the value of a varbinary expression into a base64-encoded varchar expression.

BASE64_DECODE converts a base64-encoded varchar expression into the corresponding varbinary expression.

I really like this, by the way. Base-64 encoding is quite useful for web transmissions, so having a place to generate that output easily is nice.

Second up is REGEXP_LIKE():

SQL Server 2025 introduces a new T-SQL functions for Regular Expressions (RegEx).

With multiple RegEx functions, the LIKE function indicates if the regular expression pattern matches the string or is in a string. The function is REGEXP_LIKE() that will do the job.

And third, we have REGEXP_SUBSTR() and REGEXP_REPLACE():

Continuing with SQL Server 2025 T-SQL functions for Regular Expressions for replace and substring functionalities.

Click through for Tomaz’s thoughts on all five of these functions.

Comments closed

Comparing TRANSLATE() and REPLACE()

Louis Davidson is lost in translation:

The data I am working with sometimes has people with multiple parts to their name (Mary Jo, Cindy Lou) etc, or sometimes Fred/Joe, Mary and Jack, Mary & Jack, or what have you. My goal was to turn these names into little delimited lists that I could parse on a space character with STRING_SPLIT and there were a “few” of these cases. This was the code I had arrived at when I reached the “good enough” stage of my coding.

Louis had 19 nested REPLACE() calls, but Certified Good Guy Erik Darling shows him the way.

Comments closed

Default Constraints and User-Defined Functions

Erik Darling has a new video. Erik shows how SQL Server handles default constraints that use user-defined functions and how this behaves under a variety of circumstances. There’s also a dive into parallelism and constraints. We also learned Erik’s ability to perform fractional math and how he actually differentiates “scalar” from “scaler,” proving once again that he is not midwestern from his use of extraneous vowel sounds.

1 Comment