Press "Enter" to skip to content

Category: T-SQL

Slimming down Batch Deletion in SQL Server

Matt Gantz deletes a batch at a time:

In previous articles I showed patterns for working with large amounts of data on big tables while keeping locking at a minimum. These processes can allow migrations and maintenance without requiring downtime but, in environments with unpredictable database workloads, there is a risk of heavy traffic starting at any time and disrupting a once smooth operation. In this article, I’ll demonstrate how to augment these processes to allow dynamic adjustment of the configuration.

For most systems, the main limitation these techniques run into is the speed and throughput of I/O (input/output). During periods of low traffic, a large batch size may perform great with no impact to production, but as traffic increases, the storage subsystem may not be able to keep up.

Read on for two mechanisms to make batch operations a little less stressful on the server.

A consulting customer of mine has a fairly clever mechanism for this as well: track the number of non-trivial active processes before the batch begins. If that number is above a certain threshold (say, 10 or 15 or whatever), pause for a pre-defined period of time before running. That way, if the server isn’t very active, batches can keep processing willy-nilly. But once things get busy, it reduces its activity load.

Comments closed

Multiple Filters with Regular Expressions

Louis Davidson shows off some more of the power of regular expressions:

One of the practical uses of RegEx is more powerful filtering. One of the projects I am working on, (very slowly) is sharing some SQL utilities on GitHub, Utilities like looking at the metadata of a table, searching for columns, database sizes, etc. I usually use LIKE to filter data, which lets me simply use an equality search, or I can also do a partial value search when I don’t know exactly what I am looking for.

LIKE is quite useful but, as Louis points out, it does have its limits. And in those limits is where regular expressions do so well.

Comments closed

Refactoring Code Segments in SQL

Lee Asher performs refactoring:

Over time the term “refactoring” has expanded and is sometimes used to mean code quality improvement in general, but here we are using it with its original meaning: condensing and eliminating redundant segments of code. Like factoring a number in math, we break the code into smaller blocks, identify any repeated elements, then replace them with a single reference.

I appreciate that Lee is sticking to the original meaning of the term here. Interestingly, Lee doesn’t cover T-SQL functions at all. On net, that’s probably a good thing, especially scalar functions. It’s easy to find cases where converting a function to an inline call can speed up query performance by 3x or more.

The mechanisms Lee does use could have an impact on query performance, especially lateral join/APPLY. But for some of these, as long as you do not overuse the technique, performance will be pretty similar.

Comments closed

The Joys of FORMATMESSAGE

Louis Davidson listened to some advice:

A few weeks ago, I wrote a post on using temporary stored procedures in SQL Server. Kevin Feasel of Curated SQL had this reply Using Temporary Stored Procedures to Output Common Messages. I had heard of FORMATMESSAGE before, but I had completely coupled this in my mind with formatting SQL Server error messages. (Which turns out to be a great use of this tech)

Click through to see how it works and some additional testing with RAISERROR().

Comments closed

Fuzzy Text Match in SQL Server

Rob Farley is excited:

However, SQL Server 2025 does bring some great options for doing fuzzy string matches, making custom Data Quality options even richer. I’ve spoken about this at some user groups recently (including tomorrow, remotely for TriPASS, and in a few weeks in Melbourne and Sydney for Difinity), and in that session I go much deeper into how I see data matching going. I’ll also write more about these methods in future posts, but it’ll take a few posts, covering quite a few sub-topics.

If you want to see that session, our user group (the Triangle Area SQL Server Users Group) is hosting it Wednesday morning Australia time, or this evening US Eastern Standard Time.

Comments closed

Calculating Exponential Moving Average in T-SQL

Rick Dobson watches the flow:

Exponential moving averages (emas) are a powerful means of detecting changes in time series data. However, if you are new to this task, you may be wondering how to choose from conflicting advice about how to calculate emas. This tip reviews several of the most popular methods for calculating moving averages. Additionally, this tip presents T-SQL code samples with common table expressions and stored procedures for generating emas from an underlying time series dataset.

“Emas don’t just track trends—they reveal momentum in motion.” That’s why they’re favored when recent values matter most—and why this tip focuses on helping you calculate them with precision.

Read on for the formula and a couple of lengthy scripts to generate it.

Comments closed

Comparing Sets of Data in T-SQL

Louis Davidson figured out which of these was not like the others, which of these just didn’t belong:

There are many occasions when we want to see the differences between two sets of data. Sometimes a whole table, a subset of a table, or even the results from a couple of queries, like in a unit test.

Maybe you want to see that two sets are exactly the same, for example domain table in DEV, PROD, or maybe even from source control. You might you have a orders table and an orders_history table and you want to see the overlap/changes over a given period of time, like for example, to clean out any useless history.

No matter what the reason, there is a query pattern that will work for you. In this blog I will demonstrate several of these techniques and why you might want to use them in different places.

Click through for those techniques. I am particularly fond of INTERSECT/EXCEPT because of how it handles missing data and typically performs quite well.

Comments closed

Random Number Generation in T-SQL via Marsaglia Polar Method

Sebastiao Pereira implements a method for generating random numbers in T-SQL:

Generating random numbers from a normal distribution is essential for accuracy and realistic modeling, simulation, inference, and algorithm design for scientific, engineering, statistical, and AI domains. How can we build a random number generator using Marsaglia Polar method in SQL Server without the use of external tools?

It’s an interesting technique that works well for drawing points from a two-dimensional space around a point.

Comments closed

Simple Data Quality Validation with T-SQL

Kenneth Omorodion builds a validation process:

As the need and use of data grows within any organization, there is a corresponding rising issue for the need of data quality validation. Most organizations have large stores of data but most of it are not managed efficiently in terms of data quality assurances, thus leading to inaccurate insights for the business which in turn leads to distrust in the data.

Organizations have now, more than ever, realized the importance of an efficient data quality process as part of their Business Intelligence and Analytics processes. The issue is, how can they implement data quality for their data? For larger and more data-centric organizations, they might be using pre-built data management and validation tools like Microsoft Purview or other Master Data Management tools like Informatica, Talend, SAP, Talend, and Stibo Systems. But for those organizations that can not commit to subscribing to pre-built options, or they are operating primarily on On-Premises environments, they might want to build one themselves, that’s where this article comes in.

I’m of two minds about this. One the one hand, I appreciate the effort that Kenneth put into this and expect that it would work reasonably well. On the other hand, I look at what it can do and say “Yes, but if you just use constraints like you should, then you don’t need this process.” It’s basically a very asynchronous way of applying check constraints, foreign key constraints, and specifying that an attribute is NOT NULL.

If there’s some reason why applying these constraints is impossible—such as receiving this data as an extract from some poorly-designed system—then this can do a good job of describing the failures of the upstream system. But this is where data architects need to get it right up-front.

1 Comment