Category: T-SQL

Truncating All Tables while Preserving Foreign Keys in T-SQL

Published 2025-01-28 by Kevin Feasel

When testing a data warehouse, a common challenge is managing large datasets effectively. Often, you need to reset tables to a clean state, ensuring consistent testing environments. The most efficient way to clear a table is using the SQL command TRUNCATE TABLE. However, this command is not straightforward when foreign key constraints are present. In this article, we’ll explore a solution that temporarily disables constraints, allows truncation, and then restores the constraints—keeping your data model intact.

Click through for the script, which saves a record of all of the foreign key constraints, truncates each table, and then re-creates the foreign keys.

Comments closed

Calculating the Distance between Points via T-SQL

Published 2025-01-28 by Kevin Feasel

Sebastiao Pereira gets out the measuring tape:

How do you calculate the distance between two different points on a sphere using TSQL in SQL Server? In this article, we look at how this can be done to calculate different points on the globe.

Sebastiao first shows the raw calculation, then uses the GEOGRAPHY data type to simplify the job. Note that these are “as the crow flies” distances and not travel distances, as there’s no information on roads.

Also, these are calculations specific to Earth. Which sounds like the setup for a joke, but it’s really not. If you have a smaller sphere (or oblong spheroid, if you will) and you want to calculate the distance, use the GEOMETRY data type instead.

Comments closed

Using the OUTPUT Clause

Published 2025-01-22 by Kevin Feasel

Erik Darling has a new video. Erik mentions the best use case of this being for archival tables, but I’ll add one more: if you’re using a queue table (ignoring how good or bad of an idea this is), you have multiple processes operating on this queue table, and you want to reduce the likelihood of two processes picking up the same value, you can perform the equivalent of popping off of a queue: delete the first element(s) from the queue table and output it into a temp table. From there, you can operate on that data at your leisure, and the next process will grab some other batch of data. And if everything goes mildly wrong, re-insert that data back into the queue and let some other sucker try it. I’ve used this a few times for data warehousing processes and it works out pretty well.

The only thing I’m unsure about is how he figured out that I’m CommonTableExpressionLover11357.

1 Comment

Set Operations in T-SQL

Published 2025-01-13 by Kevin Feasel

Erik Darling has a pair of videos covering 3 1/2 set operations. First up is UNION and UNION ALL. These are the set operations that most people know about and use fairly regularly, and Erik explains the difference between the two, including the performance difference between the two.

Then, Erik hits upon the two lesser-known set operations: INTERSECT and EXCEPT. These are extremely useful in certain circumstances, and tend to perform much better than other alternatives. For example, to figure out if two datasets are exactly the same, it’s really hard to go wrong with the following pair of queries:

SELECT a.* FROM a EXCEPT SELECT b.* FROM b;
SELECT b.* FROM b EXCEPT SELECT a.* FROM a;

You need both queries because the former tells you if there are any records in A that do not exist in B, either because the record simply is not there or because there is a difference in one or more values in B’s matching record. But then, you also have to check the opposite side, because there might be a record in B that does not exist in A and the first query will not expose it.

This is typically the way I’d write test cases, ensuring that both queries return 0 results. Granted, you could always just check that the count of the intersection equals the count of records:

DECLARE @c1 INT, @c2 INT;
SELECT @c1 = SELECT COUNT(*) FROM (SELECT a.* FROM a INTERSECT SELECT b.* FROM b);
SELECT @c2 = SELECT COUNT(*) FROM a;

In this case, @c1 and @c2 should be the same number.

1 Comment

Window Function Ranges: UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING

Published 2025-01-09 by Kevin Feasel

Chad Callihan engages the limit breaker:

I’m familiar with using the OVER clause and don’t think it’s too uncommon to see it used for including row numbers by using ROW_NUMBER() and aggregating data. But even though they’ve been around since SQL Server 2012, I’m not too familiar with using the OVER clause with the UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING to affect the window being queried.

Let’s take a look at a couple of examples using UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING.

Click through for those examples. The default ranges for window functions usually make a lot of sense, but it’s good to understand your options for frames: ROWS vs RANGE, as well as the frame values (UNBOUNDED PRECEDING, {N} PRECEDING, CURRENT ROW, {N} FOLLOWING, and UNBOUNDED FOLLOWING).

Comments closed

Calculating Inter-Quartile Range and Z Score in T-SQL

Published 2025-01-07 by Kevin Feasel

Sebastiao Pereira hunts for outliers:

Outliers can significantly distort statistical analysis and lead to incorrect conclusions when interpreting data. In this article, we will look at how to find outliers in SQL Server using various T-SQL queries. Understanding how to find outliers in SQL is crucial for accurate data analysis.

Sebastiao uses PERCENTILE_CONT() in this demonstration. That works fine for relatively small tables, though it does not scale well at all. Once you’re in the millions of records, it gets slow. From there, my joke is that, if you have 100 million or more records, you can start a query with PERCENTILE_CONT() on one instance. Meanwhile, on a separate instance, as soon as you kick off that query, go install SQL Server ML Services, configure it, check out a tutorial on R or Python, figure out how you can calculate the inter-quartile range in that language, learn how ML Services works, and you’ll still get the answer before your first query finishes.

If you’re using SQL Server 2022, there is a new APPROX_PERCENTILE_CONT() that is orders of magnitude faster as you get increasingly large datasets. It’s also accurate to within 1.33% (on each side of the correct answer) within a 99% confidence. The way the query works is a bit different, though, because the approximation is a nested set function using a WITHIN GROUP() clause, whereas PERCENTILE_CONT() is a window function that uses an OVER() clause. That means it’s not quite as easy as slapping “APPROX_” to the start of the query, but because Sebastiao uses WITHIN GROUP in the T-SQL, it’s pretty close: PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY [ObsValue]) OVER() AS Q1 becomes APPROX_PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY [ObsValue]) AS Q1 or something like that–I’m compiling in the browser here.

Comments closed

Using the APPLY Operator

Published 2025-01-06 by Kevin Feasel

Erik Darling knows how to get my interest.

No text for me to copy, but Erik goes into detail on good use cases of the APPLY operator. Erik hits on several of the use cases that I like to use APPLY for and does a good job demonstrating its value.

Comments closed

A Pair of NOLOCK Demos

Published 2024-12-27 by Kevin Feasel

Erik Darling has a new video:

Two Of My Favorite NOLOCK Demos from Paul White and Aaron Bertrand

Erik has covered transaction isolation levels lately in videos. The first demo from Paul White shows an issue where queries stop scanning due to data movement. The second demo from Aaron Bertrand comes from this blog post.

Comments closed

Azure SQL DB String Concatenation and JSON Functions

Published 2024-12-16 by Kevin Feasel

Magda Bronowska takes a look at some functionality currently available only in Azure SQL Database and Managed Instance:

Microsoft releases the classic SQL Server every couple of years, with some functionality added through regular updates. On the other hand, the SQL Server offering in Azure (Azure SQL Database and Managed Instance) receives the latest features earlier.

This post highlights some of the T-SQL functions currently available in Azure SQL but not yet in classic SQL Server. However, with the recent announcement of SQL Server 2025, this might change next year. Keep in mind that some of these functions are in preview, so their behavior might evolve as they reach general availability.

Click through for those examples.

Comments closed

Alternatives to Long IN() Lists in SQL Server

Published 2024-12-05 by Kevin Feasel

David Fowler still has a list:

This post comes off the back of my last, where I looked at issues caused by explicitly declaring a large number of values in an IN clause. The query processor ran out of internal resources and could not produce a query plan – When You’ve Managed to Confuse SQL With a Crazy IN Clause.

My suggestion was to put the values into a temp table and carry out a semi-join.

That got me thinking, which of the two methods would perform better.

I’m going to look at the two methods for different amounts of values and have a look at how they get on in terms of reads and total time.

Read on for the answer. If I’m surprised by anything in it, it’s that the threshold where temp tables out-perform the IN() clause is so high.

Comments closed