Press "Enter" to skip to content

Category: T-SQL

Batching Large Data Operations via Key Ranges

Andy Brownsword updates or deletes a batch of rows:

Effective batching in general helps us by:

  • Reduce transaction length and minimise blocking
  • Avoids unnecessary checking of the same rows repeatedly
  • Introduce graceful pacing to reduce impact on busy environments or data replication

I’m not the biggest fan of the OFFSET/FETCH combination there, at least if your key column is fairly well packed—like, say, 99+% of the rows are contiguous and you occasionally have a jump of a few thousand rows. Also, that batch size of 100K might be a little high, although that will certainly depend on what the operation is. Batch updating a column based on some fairly straightforward calculation? You can probably get away with 100K, though I’d still prefer 10K. But as you add more complexities (deleting rows, very high server throughput, triggers, limited hardware, etc.), that number should edge downward.

Comments closed

Splitting to a Table via Regular Expression

Louis Davidson creates a table:

Continuing on with the REGEXP_ functions series, the next one I want to cover is the table valued function REGEXP_SPLIT_TO_TABLE. This function is definitely one of the ones you probably ought to know, especially if you are ever tasked to pull some data out of a data structure.

This function is a lot like the STRING_SPLIT function, and unlike things like the REGEXP_LIKE function, you can basically use the same main parameters as you used in STRING_SPLIT for simple cases, but from there the possibilities are a lot more endless because you can define almost any delimiters you want. It isn’t perfect, because of a few things, but we will discuss that more later on.

Read on to see how it works, including one major caveat.

Comments closed

Date Intervals in PostgreSQL Window Functions

Hubert Lubaczewski solves a problem:

Since I can’t copy paste the text, I’ll try to write what I remember:

Given table sessions, with columns: user_id, login_time, and country_id, list all cases where single account logged to the system from more than one country within 2 hour time frame.

The idea behind is that it would be a tool to find hacked account, based on idea that you generally can’t change country within 2 hours. Which is somewhat true.

Solution in the blogpost suggested joining sessions table with itself, using some inequality condition. I think we can do better…

Click through for a solution that works for PostgreSQL but not SQL Server because the latter doesn’t offer date and time intervals on window function frames.

To do this in SQL Server, I’d probably use LAG() and get the prior value of country ID and the prior login time. Something like the following query, though I didn’t run detailed performance checks.

WITH records AS
(
	SELECT
		s.user_id,
		s.login_time,
		s.country_id,
		LAG(s.login_time) OVER (PARTITION BY s.user_id ORDER BY s.login_time) AS prior_login_time,
		LAG(s.country_id) OVER (PARTITION BY s.user_id ORDER BY s.login_time) AS prior_country_id
	FROM sessions s
)
SELECT *
FROM records r
WHERE
	r.prior_country_id <> r.country_id
	AND DATEDIFF(HOUR, r.prior_login_time, r.login_time) <= 2;
Comments closed

Replacing Text in SQL Server 2025 via Regular Expression

Louis Davidson continues a series on regular expressions in SQL Server 2025:

Okay, we have gone through as much of the RegEx filtering as I think is a a part of the SQL Server 2025 implementation. Now it is time to focus on the functions that are not REGEXP_LIKE. We have already talked about REGEXP_MATCHES, which will come in handy for the rest of the series.

I will start with REGEXP_REPLACE, which is like the typical SQL REPLACE function. But instead of replacing based on a static delimiter, it can be used to replace multiple (or a specific) value that matches the RegEx expression. All of my examples for this entry will simply use a variable with a value we are working on, so no need to create or load any objects.

Read on to see how it works, including plenty of examples.

Comments closed

Migrating Azure Data Studio SQL Notebooks to VS Code Polyglot Notebooks

Haroon Ashraf gives us a somewhat unwieldy process:

As a SQL/BI developer, I want to run and store my SQL scripts and documentation efficiently in a Notebook as an alternative to using Azure Data Studio SQL Notebooks since Azure Data Studio is retiring soon. Read on to learn more about Visual Studio Code Polyglot Notebooks.

I liked the simplicity of having a SQL kernel in Azure Data Studio. Haroon shows how to work around it and get to roughly the same spot, but I do hope the SQL Server tools team is able to migrate that SQL kernel over to VS Code prior to Azure Data Studio’s ultimate demise.

Comments closed

Materializing Lake Views in Microsoft Fabric

Sairam Yeturi reduces ETL and ELT requirements:

Organizations often face challenges when trying to scale analytics across large volumes of data stored in centralized SQL databases. As business teams demand faster, more tailored insights, traditional reporting pipelines can become bottlenecks. By adopting Lakehouse architecture with Microsoft Fabric, business groups can mirror their SQL data into OneLake and organize it using the Medallion architecture—Bronze, Silver, and Gold layers. Materialized lake views play a crucial role in this setup, enabling automated, declarative transformations that clean and enrich data in the Silver layer. This empowers teams to build reliable dashboards and AI-driven insights on top of curated data, all while maintaining performance, governance, and security on a scale.

In this post, we will cover how enterprises can use materialized lake views to streamline data orchestration and enhance data quality, monitoring across silver and gold layers, while mirroring their SQL DB tables to Fabric in the Bronze layer.

The best use case for this is a scenario in which your underlying data is already essentially in a star schema or at least easily transformable into one, and you have no interest in modifying the data in the view directly. Do read the limitations before digging in, though, as there are some big ones.

Comments closed

Tracking Time Series Rates of Change in SQL Server

Rick Dobson wants a measure of variation:

This tip presents a brief introduction to Common Table Expressions (CTE), along with a few references for those seeking additional details on CTEs beyond those described and demonstrated here. We will examine CTEs that are defined by either one or two SELECT statements. Additionally, we will provide a demonstration of a recursive CTE. All the examples illustrate how to process time series datasets with CTEs.

Click through for the tip.

Comments closed

Row and Range Frames in Window Functions and Batch Mode

Erik Darling covers how your window frame (that is, ROWS or RANGE in the window function definition) can affect batch mode.

Erik looks at a classic performance difference between ROWS and RANGE, as well as what batch mode does to even the score. This is particularly nice because ROWS and RANGE both have their utility and focusing on one versus the other for performance differences can lead to awkward development practices to get around a window spool.

Erik also focuses primarily on batch mode on rowstore, so keep in mind the minimum requirements for it: 131,072 (or 2^17) rows in at least one table in the query, at least one operator that benefits from batch mode (which we’d cover in the window function), at least one input of the batch with 2^17 rows, and where the batch mode cost is lower than the row mode cost.

Comments closed

Pattern Matching with REGEXP_LIKE() in SQL Server 2025

Koen Verbeeck writes a regular expression:

I need to do some data validation in our SQL Server database. However, the validation rules are too complex for the T-SQL LIKE function, and I can’t seem to get it done either with PATINDEX or something similar. I’d like to use regular expressions as they’re more powerful. SQL Server 2025 now has a regex function regexep_like to use regular expressions.

Read on for some examples, advice on validating e-mail addresses, and more.

Comments closed