COUNT(*) needs to return the exact number of rows. EXISTS only needs to answer a question like:
“Are there any rows at all?”
In other words, EXISTS can short-circuit after having found the first matching row. If your client code (e.g. written in Java or in PL/SQL, or any other client language) needs to know something like:
“Did actors called “Wahlberg” play in any films at all?”
Lukas shows how it works in Oracle and Postgres; the result is still basically the same for SQL Server.
The idea of using key-value pairs to store data isn’t new, but with the rapid development of cloud solutions like Azure and the hype around NoSQL databases, using key-value pairs to store data got a big boost. Especially developers (in my experience) love using key-value pair to store their data, because it’s easy for them to consume the data in an application. But it gives the database professional an extra challenge because we’re used to retrieve columns with values instead of a record per value. So how can we turn those key-value pairs into rows?
This is a good example of using PIVOT. I’m not a big fan of storing data in key-value pairs and using pivoting operators because you’re burning CPU on that very expensive SQL Server instance (and you’re not taking advantage of what relational databases do well); if you really need to store data as key-value, I’d recommend doing the pivot in cheaper application servers.
Basic stuff, right? Both will return 951 records (books) that I do not own. And, very quickly…because the tables are tiny. Sub-1 second is fast.
The issue here is HOW the rows are compared.
English version now, techy stuff later:
In the first query, this is equivalent to you standing at the bookstore and calling home to have someone check to see if the book in your hand is already in your collection. EVERY time. One by one.
In the second, you got really smart and brought a list with you, which you are comparing to the books on the shelf at the store. You’ve got both “lists” in one place, so it is far more efficient.
Even in the case with a few hundred records, you can see why there’d be a performance difference.
In SQL, everything is a table (see SQL trick #1 in this article), just like in relational algebra, everything is a set.
Now, PL/SQL is a useful procedural language that “builds around” the SQL language in the Oracle database. Some of the main reasons to do things in PL/SQL (rather than e.g. in Java) are:
- Performance (the most important reason), e.g. when doing ETL or reporting
- Logic needs to be “hidden” in the database (e.g. for security reasons)
- Logic needs to be reused among different systems that all access the database
Much like Java’s foreach loop, PL/SQL has the ability to define implicit cursors (as opposed to explicit ones)
The WHILE loop is a little more helpful in the SQL Server world for doing things like deleting lots of rows in small batches, but I agree with Lukas’s sentiment: if you start writing a WHILE loop, it’s best to sit back and think about whether this is the best decision.
The Common Table Expression (CTE) is a great tool in T-SQL. The CTE provides a mechanism to define a query that can be easily reused over and over within another query. The CTE also provides a mechanism for recursion which, though a little dangerous and overused, is extremely handy for certain types of queries. However, the CTE has a very unfortunate name. Over and over I’ve had to walk people back from the “Table” in Common Table Expression. The CTE is just a query. It’s not a table. It’s not providing a temporary storage space like a table variable or a temporary table. It’s just a query. Think of it more like a temporary view, which is also just a query.
Read the whole thing.
Wait! Hold on two seconds there! Surely the semi-colon is an absolute requirement because we see it everywhere that it is a mandatory requirement.
The reality is that the semi-colon requirement is not really entirely accurate. If the CTE happens to be in the same batch, then the previous statement in the batch must be terminated by the semi-colon.
This post went down an unexpected path, and ended up being rather interesting. Read the whole thing.
Because we’ve added OR conditions into the mix, we’re forced to use the Nested Loop join, which loops over table B for every single row in A. That’s a lot of index scans and it comes with a hefty price tag.
Here’s an absolutely eye-watering beautiful pattern that I found on the Interwebs (though I forgot where) the other day.
This is an interesting use of INTERSECT. Check it out.
Ultimately, you should always choose performance first, and then – most certainly – intuitiveness second (because some poor soul might need to maintain your query). But personally, I find these quantifiers quite elegant for three reasons:
They express the quantification right where it belongs. With the comparison operator. Compare this with the solution using LIMIT, which may be far away, visually, from the greater-than operator. Quantifiers are much more concise, even than when using MAX() (in my opinion)
They’re very set oriented. I like thinking in terms of sets when I work with SQL. Whenever I can omit the
ORDER BYclause, I will. If only to avoid potentially slow operations (in case the database doesn’t optimise this, and a full
O(N log N)sort operation is invoked)
Quantified comparison predicates work on rows too, not just on single values.
I’ve known about these, but could probably count on one hand the number of times I’ve ever used one.
This code works, but if you have dozens of years, it gets messy writing those case statements and you’re a bit more likely to make a mistake when refactoring code. Here’s a simpler version using CROSS APPLY:
(2013, [Qty2013], [Val2013]),
(2014, [Qty2014], [Val2014]),
(2015, [Qty2015], [Val2015])
], Quantity, [Value]);
It’s a little easier to read than the other version, and adding additional years is pretty straightforward. That makes for a great tip when you’re trying to refactor poorly-thought-out tables or bring into your system potentially well-thought-out flat files.
APPLY is an elegant solution to so many different classes of problem.
This doesn’t work.
The reason this doesn’t work is that XML is case sensitive. Meaning ORDERID != OrderID. The former is in the query, the latter in the XML document. If I change the query, this works (note I have OrderID below).
Like Steve, I’m not a big fan of doing XML processing within SQL Server, but if it’s a necessary part of your workload, it’s worth knowing.