Press "Enter" to skip to content

Category: Syntax

Discerning the Value of an Empty Table

Andy Levy performs some database archaeology:

It seems like no matter how long you work with a system beyond a trivial size, you’ll find something new every so often. A little while ago, I found a table without about a dozen columns, no data, and was referenced in only one place across an entire database hosting nearly 1000 tables and over 8000 stored procedures. Why does this thing even exist?

To protect the innocent(ish?), obviously I’m changing the names of everything here. I was looking into some performance issues and found a very short stored procedure being called from another stored procedure which opened with this query:

Click through for the query and why it existed in the first place.

Leave a Comment

Experimenting with BIT_COUNT

Louis Davidson has an idea:

I was editing an article the other day that uses the BIT_COUNT function that was added to SQL Server 2022. The solution presented is excellent, and I will try to come back and link to it here when I remember. (It will be linked the other way.

Reading that did two things for me. First it cave me an idea of how the BIT_COUNT function might be actually be used in a useful way. The solution that was presented would only work in SQL Server 2022 (It will work in earlier compatibility levels, based on the tests I have done.)

Read on for what Louis tried out.

Leave a Comment

COALESCE() in T-SQL

Rajendra Gupta has a backup plan in case of NULL:

NULL is a special marker that indicates a missing or undefined value in a column. It is different from zero or an empty string. Handling NULL values is essential for accurate data analysis, data integrity, and error avoidance. This tip explores how to handle NULL values in SQL Server using the COALESCE() function using various queries and reviewing the results.

Click through for a primer on the COALESCE() function, a few use cases for COALESCE(), and how it differs from ISNULL().

Leave a Comment

The Logic behind RIGHT OUTER JOIN

Constantine Kokkinos provides an explanation:

I was talking to a friend of mine and they are learning some SQL and they said something that I have seen come up multiple times in learning SQL.

They said “Yeah, I need to study the join types more. They make sense to me but I want to be able to not reference my notes” and also “I don’t really get the point of a right join if your can do the same thing with a left join by just switching the table name.”

These are great points, and common questions that occur when first learning SQL.

I won’t steal CK’s thunder (too much) about how we express joins in set theory, though I think when he mentions “OUTER” as a type of join, perhaps that’s supposed to be FULL OUTER JOIN?

Regardless, my take: there is a good reason to use INNER JOIN. There is a good reason to use LEFT OUTER JOIN. There is a good reason to use CROSS JOIN. There is a good reason to use FULL OUTER JOIN. The frequency in which you should use each is in descending order, meaning that there are relatively few circumstances in which you should use a FULL OUTER JOIN, but they do exist.

There are no good circumstances for a RIGHT OUTER JOIN. The concept logically exists, but has no practical value to us.

Leave a Comment

T-SQL Variables and Transactions

Simon Frazer rolls back:

If you’ve been working with T-SQL scripts for a while, you’ve likely encountered variables. These are essential for writing scripts that go beyond basic SELECT, INSERT, UPDATE, or DELETE statements.

Variables can serve multiple purposes: they can act as parameters passed into stored procedures, hold the results of one query to use in another, or even help debug scripts during testing (you do test your scripts before running them on real data, right?).

One particularly interesting aspect of variables is how they behave in transactions, often in ways we might not expect.

Read on for the demo. This also applies to table variables, which is why they can be extremely important for diagnosing issues before performing a rollback. You can write error results to the table variable first, then access them post-rollback.

Comments closed

The Pain of Cursors in SQL Server

Joe Fleming is not a fan of cursors:

So what are cursors, and why are they so bad?  A cursor is a construct in SQL Server that lets you define a query which produces a multi-row dataset and allows you to step through it one row at a time. You declare a cursor in this way:

DECLARE Performance_killer1 CURSOR FOR SELECT OrderNumber, OrderLineNumber, ItemID, ItemDescription, Quantity, Price FROM Orders

You must then OPEN the cursor and FETCH the next value from it into a set of variables, then perform your calculations and updates, repeating until you’ve reached the end of your dataset. From the standpoint of someone unfamiliar with database processing, this method is fairly intuitive. You do one thing at a time. From the database professional’s perspective, it’s very painful to see. There are a few reasons why.

I generally agree with this, though there are specific queries that you cannot write in a set-based context, particularly administrative queries that have to run over each table or database in an instance. I also found that I was unable to write a proper leaky bucket algorithm implementation in T-SQL without using a cursor. But the other 99.x% of the time? No cursors needed.

Also, if you do need a cursor, use a cursor instead of a WHILE loop. It’s more to remember but you can performance tune cursors, whereas WHILE loops are about as dog-slow as they get and never get better.

Comments closed

ALL vs ALLCROSSFILTERED in DAX

Marco Russo and Alberto Ferrari disambiguate a pair of operators:

Have you ever wondered what the subtle difference between ALL and ALLCROSSFILTERED might be? The family of ALL functions and modifiers includes some common functions, like ALL and ALLSELECTED, and some fancier and less frequently-used functions, like ALLNOBLANKROW and ALLCROSSFILTERED. This article discusses what ALLCROSSFILTERED is, why it is there in DAX, and when and how developers should use it.

Read on for that answer, along with several helpful demos.

Comments closed

Using the OUTPUT Clause

Erik Darling has a new video. Erik mentions the best use case of this being for archival tables, but I’ll add one more: if you’re using a queue table (ignoring how good or bad of an idea this is), you have multiple processes operating on this queue table, and you want to reduce the likelihood of two processes picking up the same value, you can perform the equivalent of popping off of a queue: delete the first element(s) from the queue table and output it into a temp table. From there, you can operate on that data at your leisure, and the next process will grab some other batch of data. And if everything goes mildly wrong, re-insert that data back into the queue and let some other sucker try it. I’ve used this a few times for data warehousing processes and it works out pretty well.

The only thing I’m unsure about is how he figured out that I’m CommonTableExpressionLover11357.

1 Comment

Set Operations in T-SQL

Erik Darling has a pair of videos covering 3 1/2 set operations. First up is UNION and UNION ALL. These are the set operations that most people know about and use fairly regularly, and Erik explains the difference between the two, including the performance difference between the two.

Then, Erik hits upon the two lesser-known set operations: INTERSECT and EXCEPT. These are extremely useful in certain circumstances, and tend to perform much better than other alternatives. For example, to figure out if two datasets are exactly the same, it’s really hard to go wrong with the following pair of queries:

SELECT a.* FROM a EXCEPT SELECT b.* FROM b;
SELECT b.* FROM b EXCEPT SELECT a.* FROM a;

You need both queries because the former tells you if there are any records in A that do not exist in B, either because the record simply is not there or because there is a difference in one or more values in B’s matching record. But then, you also have to check the opposite side, because there might be a record in B that does not exist in A and the first query will not expose it.

This is typically the way I’d write test cases, ensuring that both queries return 0 results. Granted, you could always just check that the count of the intersection equals the count of records:

DECLARE @c1 INT, @c2 INT;
SELECT @c1 = SELECT COUNT(*) FROM (SELECT a.* FROM a INTERSECT SELECT b.* FROM b);
SELECT @c2 = SELECT COUNT(*) FROM a;

In this case, @c1 and @c2 should be the same number.

1 Comment