Press "Enter" to skip to content

Category: T-SQL

T-SQL Variables and Transactions

Simon Frazer rolls back:

If you’ve been working with T-SQL scripts for a while, you’ve likely encountered variables. These are essential for writing scripts that go beyond basic SELECT, INSERT, UPDATE, or DELETE statements.

Variables can serve multiple purposes: they can act as parameters passed into stored procedures, hold the results of one query to use in another, or even help debug scripts during testing (you do test your scripts before running them on real data, right?).

One particularly interesting aspect of variables is how they behave in transactions, often in ways we might not expect.

Read on for the demo. This also applies to table variables, which is why they can be extremely important for diagnosing issues before performing a rollback. You can write error results to the table variable first, then access them post-rollback.

Comments closed

The Pain of Cursors in SQL Server

Joe Fleming is not a fan of cursors:

So what are cursors, and why are they so bad?  A cursor is a construct in SQL Server that lets you define a query which produces a multi-row dataset and allows you to step through it one row at a time. You declare a cursor in this way:

DECLARE Performance_killer1 CURSOR FOR SELECT OrderNumber, OrderLineNumber, ItemID, ItemDescription, Quantity, Price FROM Orders

You must then OPEN the cursor and FETCH the next value from it into a set of variables, then perform your calculations and updates, repeating until you’ve reached the end of your dataset. From the standpoint of someone unfamiliar with database processing, this method is fairly intuitive. You do one thing at a time. From the database professional’s perspective, it’s very painful to see. There are a few reasons why.

I generally agree with this, though there are specific queries that you cannot write in a set-based context, particularly administrative queries that have to run over each table or database in an instance. I also found that I was unable to write a proper leaky bucket algorithm implementation in T-SQL without using a cursor. But the other 99.x% of the time? No cursors needed.

Also, if you do need a cursor, use a cursor instead of a WHILE loop. It’s more to remember but you can performance tune cursors, whereas WHILE loops are about as dog-slow as they get and never get better.

Comments closed

Using the Azure SQL DB Query Editor

Josephine Bush writes a query:

I keep losing track of this wondering where it went. You have to access it at the database level. Adding this post to remind me for later. This came in very handy when my home internet went down and I couldn’t auth on my phone hotspot without timeouts in Azure Data Studio.

You can login in with SQL Server auth or Entra.

Read on for some notes about limitations. It is definitely a helpful tool for occasional queries or having a simpler way to access data without having to set up a VPN and a whole bunch of tools.

Comments closed

Truncating All Tables while Preserving Foreign Keys in T-SQL

Ronald Kraijesteijn builds a script:

When testing a data warehouse, a common challenge is managing large datasets effectively. Often, you need to reset tables to a clean state, ensuring consistent testing environments. The most efficient way to clear a table is using the SQL command TRUNCATE TABLE. However, this command is not straightforward when foreign key constraints are present. In this article, we’ll explore a solution that temporarily disables constraints, allows truncation, and then restores the constraints—keeping your data model intact.

Click through for the script, which saves a record of all of the foreign key constraints, truncates each table, and then re-creates the foreign keys.

Comments closed

Calculating the Distance between Points via T-SQL

Sebastiao Pereira gets out the measuring tape:

How do you calculate the distance between two different points on a sphere using TSQL in SQL Server? In this article, we look at how this can be done to calculate different points on the globe.

Sebastiao first shows the raw calculation, then uses the GEOGRAPHY data type to simplify the job. Note that these are “as the crow flies” distances and not travel distances, as there’s no information on roads.

Also, these are calculations specific to Earth. Which sounds like the setup for a joke, but it’s really not. If you have a smaller sphere (or oblong spheroid, if you will) and you want to calculate the distance, use the GEOMETRY data type instead.

Comments closed

Using the OUTPUT Clause

Erik Darling has a new video. Erik mentions the best use case of this being for archival tables, but I’ll add one more: if you’re using a queue table (ignoring how good or bad of an idea this is), you have multiple processes operating on this queue table, and you want to reduce the likelihood of two processes picking up the same value, you can perform the equivalent of popping off of a queue: delete the first element(s) from the queue table and output it into a temp table. From there, you can operate on that data at your leisure, and the next process will grab some other batch of data. And if everything goes mildly wrong, re-insert that data back into the queue and let some other sucker try it. I’ve used this a few times for data warehousing processes and it works out pretty well.

The only thing I’m unsure about is how he figured out that I’m CommonTableExpressionLover11357.

1 Comment

Set Operations in T-SQL

Erik Darling has a pair of videos covering 3 1/2 set operations. First up is UNION and UNION ALL. These are the set operations that most people know about and use fairly regularly, and Erik explains the difference between the two, including the performance difference between the two.

Then, Erik hits upon the two lesser-known set operations: INTERSECT and EXCEPT. These are extremely useful in certain circumstances, and tend to perform much better than other alternatives. For example, to figure out if two datasets are exactly the same, it’s really hard to go wrong with the following pair of queries:

SELECT a.* FROM a EXCEPT SELECT b.* FROM b;
SELECT b.* FROM b EXCEPT SELECT a.* FROM a;

You need both queries because the former tells you if there are any records in A that do not exist in B, either because the record simply is not there or because there is a difference in one or more values in B’s matching record. But then, you also have to check the opposite side, because there might be a record in B that does not exist in A and the first query will not expose it.

This is typically the way I’d write test cases, ensuring that both queries return 0 results. Granted, you could always just check that the count of the intersection equals the count of records:

DECLARE @c1 INT, @c2 INT;
SELECT @c1 = SELECT COUNT(*) FROM (SELECT a.* FROM a INTERSECT SELECT b.* FROM b);
SELECT @c2 = SELECT COUNT(*) FROM a;

In this case, @c1 and @c2 should be the same number.

1 Comment

Window Function Ranges: UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING

Chad Callihan engages the limit breaker:

I’m familiar with using the OVER clause and don’t think it’s too uncommon to see it used for including row numbers by using ROW_NUMBER() and aggregating data. But even though they’ve been around since SQL Server 2012, I’m not too familiar with using the OVER clause with the UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING to affect the window being queried.

Let’s take a look at a couple of examples using UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING.

Click through for those examples. The default ranges for window functions usually make a lot of sense, but it’s good to understand your options for frames: ROWS vs RANGE, as well as the frame values (UNBOUNDED PRECEDING, {N} PRECEDING, CURRENT ROW, {N} FOLLOWING, and UNBOUNDED FOLLOWING).

Comments closed

Calculating Inter-Quartile Range and Z Score in T-SQL

Sebastiao Pereira hunts for outliers:

Outliers can significantly distort statistical analysis and lead to incorrect conclusions when interpreting data. In this article, we will look at how to find outliers in SQL Server using various T-SQL queries. Understanding how to find outliers in SQL is crucial for accurate data analysis.

Sebastiao uses PERCENTILE_CONT() in this demonstration. That works fine for relatively small tables, though it does not scale well at all. Once you’re in the millions of records, it gets slow. From there, my joke is that, if you have 100 million or more records, you can start a query with PERCENTILE_CONT() on one instance. Meanwhile, on a separate instance, as soon as you kick off that query, go install SQL Server ML Services, configure it, check out a tutorial on R or Python, figure out how you can calculate the inter-quartile range in that language, learn how ML Services works, and you’ll still get the answer before your first query finishes.

If you’re using SQL Server 2022, there is a new APPROX_PERCENTILE_CONT() that is orders of magnitude faster as you get increasingly large datasets. It’s also accurate to within 1.33% (on each side of the correct answer) within a 99% confidence. The way the query works is a bit different, though, because the approximation is a nested set function using a WITHIN GROUP() clause, whereas PERCENTILE_CONT() is a window function that uses an OVER() clause. That means it’s not quite as easy as slapping “APPROX_” to the start of the query, but because Sebastiao uses WITHIN GROUP in the T-SQL, it’s pretty close: PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY [ObsValue]) OVER() AS Q1 becomes APPROX_PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY [ObsValue]) AS Q1 or something like that–I’m compiling in the browser here.

Comments closed