Category: Syntax

An Overview of the T-SQL Script DOM

Published 2020-06-08 by Kevin Feasel

Scripts are parsed by invoking the Parse method of T-SQL script DOM library TSqlParser class. The parser understands the complex T-SQL abstract syntax tree and splits T-SQL source into atomic TSqlParserTokens of TSqlTokenTypes that represent keywords, identifiers, punctuation, literals, whitespace, etc. These low-level tokens are grouped into more meaningful TSqlFragm ent objects that represent language elements of the script DOM, such as batches, statements, clauses, etc. Fragments, rather than the low-level parser tokens, are most often used in practice, although the underlying tokens are available for specialized requirements
The Parse method returns a TSqlFragment object of type TSqlScript containing all fragments within the script. This top-level fragment of the DOM hierarchy provides programmatic access to all language element fragments in the script. Nearly 1,000 different fragment types exist today due to the many granular T-SQL language elements.

Dan provides several examples of how to use the script DOM, making this a must-read if you’re interested in writing code around SQL Server.

Comments closed

The Rollup and Cube Operators

Published 2020-06-03 by Kevin Feasel

Greg Dodd digs into the ROLLUP and CUBE operators in a two-parter. First, ROLLUP:

As you can see, we now have these null’s popping up, but with totals. Row 5 for example, tells us that in 2017 there were 1,427,461 people living in Hawaii. Row 11 tells us that there are 2,438,188 people living in Rhode Island and Hawaii in 2017. Row 22 tells us that there were 2,429,070 people living in Rhode Island and Hawaii in 2018, and finally row 23 tells us that in total there have been 4,867,268 people in 2017 and 2018. This last row is a bit useless for this data as the overlap of those people would be huge, but for something like sales data, this number could be useful.

Next, CUBE:

For those with a keen eye you’ll see that I’ve started at row 28 in that screenshot. When we run the GROUP BY without ROLLUP or CUBE we get just 16 rows. With ROLLUP that grows to 23, but with CUBE it explodes out to 57. Why?

I’ve used ROLLUP several times with proper hierarchical data (e.g., product category, product sub-category, product) and it does an excellent job of summarizing that sort of data. CUBE has always returned too many rows for my liking. But the operator I go to most frequently is GROUPING SETS, as then I get to control the levels.

Comments closed

Order of Operations and the Unary Negative Operator

Published 2020-05-19 by Kevin Feasel

Kenneth Fisher takes us through a weird problem:

Last but certainly not least our problem child.
SELECT -100.0/-100.0*10.0
In the absence of parenthesis and nothing but multiplication and division it should go left to right. Right? But that would give us -100.0/-100.0 returning 1.0 and finally 1.0*10.0 giving us 10.0. But that’s not what we are getting. Somehow we are ending up with 0.1.

Read the comments, where Joe Celko explains the reasoning behind this strange behavior.

I tried this out on dbfiddle and here were some quick results:

Postgres 12 — 10.0
Oracle 18c (had to query from dual) — 10
MySQL 8.0 — 10
SQLite 3.27 — 10
SQL Server — 0.1

So it’s definitely not the case everywhere. The thing I noticed for SQL Server is that they don’t call out unary operators in the operator precedence guide, as opposed to, say, Oracle.

Comments closed

Notes on Derived Tables

Published 2020-05-14 by Kevin Feasel

Itzik Ben-Gan continues a series on table expressions:

The term derived table is used in SQL and T-SQL with more than one meaning. So first I want to make it clear which one I’m referring to in this article. I’m referring to a specific language construct that you define typically, but not only, in the FROM clause of an outer query. I’ll provide the syntax for this construct shortly.
The more general use of the term derived table in SQL is the counterpart to a derived relation from relational theory. A derived relation is a result relation that is derived from one or more input base relations, by applying relational operators from relational algebra like projection, intersection and others to those base relations. Similarly, in the general sense, a derived table in SQL is a result table that is derived from one or more base tables, by evaluating expressions against those input base tables.

There’s a lot to digest in this post, so check it out.

Comments closed

XML and JSON Creation in SQL Server vs Elsewhere

Published 2020-05-07 by Kevin Feasel

Lukas Eder walks us through converting result sets to XML and JSON using different platforms:

SQL Server supports transforming flat tabular SQL result sets into hierarchical structures by convention using the convenient FOR XML or FOR JSON syntaxes. This is really convenient and less verbose than the standard SQL/XML or SQL/JSON APIs – although the standard ones are more powerful.
In this blog post, I’d like to show a few core features of the SQL Server syntax, and what they correspond to in standard SQL. jOOQ 3.14 will support both SQL Server’s syntax and the standard syntax, and will be able to translate from one to the other, such that you can use SQL Server syntax also on Db2, MariaDB, MySQL, Oracle, PostgreSQL. You can play around with the current state of development on our website here.

Click through for a series of examples.

Comments closed

WAITFOR TIME Instead of DELAY

Published 2020-05-07 by Kevin Feasel

Denis Gobo points out that there is a second option when using WAITFOR:

I was looking at some code I wrote the other day and noticed the WAITFOR command.. This got me thinking. How many times have I used WAITFOR in code, probably as much as I have used NTILE 🙂
I looked at the documentation for WAITFOR and notice there is TIME in addition to DELAY. Oh that is handy, I always rolled my own ghetto-style version by calculating how long it would be until a specific time and then I would use that in the WAITFOR DELAY command

Read on to see why you might use WAITFOR and how to use each option. I use it a bit more frequently than Denis, though I still need to run a test each time to get the syntax right…

Comments closed

Issues with Window Functions and Views / Derived Tables

Published 2020-05-06 by Kevin Feasel

Paul White takes us through some of the difficulties you are liable to see when using window functions in views and derived tables:

Our expectation is that the execution plan for this new query will be exactly the same as before we created the view. The query optimizer should be able to push the product filter specified in the WHERE clause down into the view, resulting in an index seek.
We need to stop and think a bit at this point, however. The query optimizer can only produce execution plans that are guaranteed to produce the same results as the logical query specification. Is it safe to push our WHERE clause into the view?

Read the whole thing.

Comments closed

IS DISTINCT FROM with Snowflake

Published 2020-05-04 by Kevin Feasel

Koen Verbeeck shows us a good operator in Snowflake:

This single expression both checks for the equality of its members, but also checks the nullability of both columns. Awesome. A good habit would be to use IS [NOT] DISTINCT FROM instead of every = or <> in every expression (join clauses, WHERE clauses etc.) and you’ll never get burned by those pesky NULLs again!

Koen has a link to a Microsoft feedback item to add this syntax to SQL Server. But that item’s been there for more than a decade, so I would not hold my breath waiting for it to show up.

Comments closed

The Halloween Problem, Continued

Published 2020-05-04 by Kevin Feasel

Paul White continues a series on the Halloween Problem. Part 2 looks at insert and delete statements:

In the first part of this series, we saw how the Halloween Problem applies to UPDATE queries. To recap briefly, the problem was that an index used to locate records to update had its keys modified by the update operation itself (another good reason to use included columns in an index rather than extending the keys). The query optimizer introduced an Eager Table Spool operator to separate the reading and writing sides of the execution plan to avoid the problem. In this post, we will see how the same underlying issue can affect INSERT and DELETE statements.

Part 3 looks at the train wreck MERGE operator:

In the right circumstances, the SQL Server optimizer can recognize that the MERGE statement is hole-filling, which is just another way of saying that the statement only adds rows where there is an existing gap in the target table’s key.
For this optimization to be applied, the values used in the WHEN NOT MATCHED BY TARGET clause must exactly match the ON part of the USING clause. Also, the target table must have a unique key (a requirement satisfied by the PRIMARY KEY in the present case).
Where these requirements are met, the MERGE statement does not require protection from the Halloween Problem.

If only it weren’t busted in so many other ways!

Part 4 wraps up the series:

The SQL Server optimizer has specific features that allow it to reason about the level of Halloween Protection (HP) required at each point in the query plan, and the detailed effect each operator has. These extra features are incorporated into the same property framework the optimizer uses to keep track of hundreds of other important bits of information during its search activities.
Each operator has a required HP property and a delivered HP property. The required property indicates the level of HP needed at that point in the tree for correct results. The delivered property reflects the HP provided by the current operator and the cumulative HP effects provided by its subtree.

This last one goes into some nice detail.

Comments closed

Performance and T-SQL’s CHOOSE

Published 2020-04-28 by Kevin Feasel

Grant Frichey answers a question:

Questions absolutely drive my blog content and I really liked this one: how does the T-SQL CHOOSE command affect performance.
On the face of it, I honestly don’t think it will affect performance at all, depending on where and how you use it. However, the answer is always best supplied by testing.

Grant’s post ended up being much more interesting than I had anticipated—my anticipated answer was “It’s a CASE statement so it behaves like a CASE statement.” But there is some nuance that I’ve left out.

Comments closed