Press "Enter" to skip to content

Category: T-SQL

An Overview of the T-SQL Script DOM

Dan Guzman provides a public service:

Scripts are parsed by invoking the Parse method of T-SQL script DOM library TSqlParser class. The parser understands the complex T-SQL abstract syntax tree and splits T-SQL source into atomic TSqlParserTokens of TSqlTokenTypes that represent keywords, identifiers, punctuation, literals, whitespace, etc. These low-level tokens are grouped into more meaningful TSqlFragment objects that represent language elements of the script DOM, such as batches, statements, clauses, etc. Fragments, rather than the low-level parser tokens, are most often used in practice, although the underlying tokens are available for specialized requirements

The Parse method returns a TSqlFragment object of type TSqlScript containing all fragments within the script. This top-level fragment of the DOM hierarchy provides programmatic access to all language element fragments in the script. Nearly 1,000 different fragment types exist today due to the many granular T-SQL language elements.

Dan provides several examples of how to use the script DOM, making this a must-read if you’re interested in writing code around SQL Server.

Comments closed

The Rollup and Cube Operators

Greg Dodd digs into the ROLLUP and CUBE operators in a two-parter. First, ROLLUP:

As you can see, we now have these null’s popping up, but with totals. Row 5 for example, tells us that in 2017 there were 1,427,461 people living in Hawaii. Row 11 tells us that there are 2,438,188 people living in Rhode Island and Hawaii in 2017. Row 22 tells us that there were 2,429,070 people living in Rhode Island and Hawaii in 2018, and finally row 23 tells us that in total there have been 4,867,268 people in 2017 and 2018. This last row is a bit useless for this data as the overlap of those people would be huge, but for something like sales data, this number could be useful.

Next, CUBE:

For those with a keen eye you’ll see that I’ve started at row 28 in that screenshot. When we run the GROUP BY without ROLLUP or CUBE we get just 16 rows. With ROLLUP that grows to 23, but with CUBE it explodes out to 57. Why?

I’ve used ROLLUP several times with proper hierarchical data (e.g., product category, product sub-category, product) and it does an excellent job of summarizing that sort of data. CUBE has always returned too many rows for my liking. But the operator I go to most frequently is GROUPING SETS, as then I get to control the levels.

Comments closed

Updated SQL Server Diagnostic Queries

Glenn Berry has an updated set of DMV queries for us:

These are my SQL Server Diagnostic Information Queries for June 2020, aka my DMV Diagnostic Queries. They allow you to get a very comprehensive view of the configuration and performance of your SQL Server instance in a short amount of time. There are separate versions of these T-SQL queries for SQL Server 2005 through SQL Server 2019. I also have separate versions for SQL Managed Instance and Azure SQL Database. My diagnostic queries have been used by many people around the world since 2009. I make regular improvements to these queries each month.

This is one of my favorite methods for learning about a new SQL Server instance.

Comments closed

Notes on Derived Tables

Itzik Ben-Gan continues a series on table expressions:

The term derived table is used in SQL and T-SQL with more than one meaning. So first I want to make it clear which one I’m referring to in this article. I’m referring to a specific language construct that you define typically, but not only, in the FROM clause of an outer query. I’ll provide the syntax for this construct shortly.

The more general use of the term derived table in SQL is the counterpart to a derived relation from relational theory. A derived relation is a result relation that is derived from one or more input base relations, by applying relational operators from relational algebra like projection, intersection and others to those base relations. Similarly, in the general sense, a derived table in SQL is a result table that is derived from one or more base tables, by evaluating expressions against those input base tables.

There’s a lot to digest in this post, so check it out.

Comments closed

Building Queues in the Database

Erik Darling has created a series on queue tables in the database. Part one builds out a table:

A little explanation of the table: this is good for a reusable, ordered queue, that’ll give us some feedback on how long things took. It could also be used is there were a pending element, but we’d probably wanna change the indexing so that we could find either the last start time, or the last end time efficiently.

The thing that probably needs the most explanation here is the indexing and constraints. Because I know you, and you’re staring at the way I have my primary key, and you’re getting an itch and a twitch. That’s okay, it’s you’ve been conditioned for years to put the most selective column first. Unfortunately, that won’t work here.

Part two takes us through querying the queue:

For the table, our main priorities were indexing to make sure we can find work easily, and not allowing duplicate items.

For our worker proc, the main goals are going to be

– Looping until we run out of work
– Finding and reserving work with minimal locking
– Making that process atomic

Thankfully, this is a lot easier than it sounds. The table design does a lot of the work for us.

Erik’s design is not one I typically reach for, though my constraints are a bit different from his—typically, I’m using queue tables to run on periodic schedules and grab batches of records which finish processing before the next timed batch begins, and processes are idempotent, so if a queued item re-runs on occasion, it’s okay. But this is a really good technique if you need a more robust solution.

Comments closed

Never Forget a WHERE Clause Again

If you frequently forget to include the WHERE clause when you run queries, Rob Farley has an answer for you:

You meant to only update a few rows, but you weren’t paying attention to what you’d selected before hitting the Execute button. The WHERE clause got missed from what you executed, and you’ve updated every row. Oops.

Now, I totally hear you that you could’ve put this in a transaction, checked your results and then only done a COMMIT when you were sure. You could even have put that COMMIT inside an IF block that checked @@ROWCOUNT… yes, yes, yes. You could have used an SSMS add-in to let you execute the whole query that you’re currently on. You could’ve used a different tool entirely, such as Azure Data Studio. There are plenty of ways to avoid this. But I’m going to show you something that I find works for me, especially when I’m giving the script to someone else to run.

It’s more effort and makes the code harder to read, but if you have a real concern about somebody (and that somebody can include you) goofing this up, it does the job.

Comments closed

The Value of Negative Identity Values

Randolph West explains why you might start at MIN(INT) for an identity integer column:

A quick(er) post this week, in response to Greg Low’s blog post from a few weeks ago titled “Don’t start identity columns or sequences with large negative values.”

Greg writes that you shouldn’t use large negative values in a table, because… it’s hard to read them, I guess? And also they don’t compress well.

I disagree … to a degree. Dang, words are hard. Anyway, when I design a table I create what’s called a surrogate key as my primary key, which is a value that is intended for the table to uniquely identify a row so that it participates in relational activities like joins and foreign keys in an efficient way. In other words the identity column is not for me, it’s for the database engine. I don’t need to worry about what row a value has. I choose the data type for that identity column based on the estimated number of rows, not whether I can memorize that a [StatusID] of 5 means something. Magic numbers are bad, mmmkay?

I don’t mind using negative values, especially for things like queue tables where the rows are ephemeral. The identity values may be harder to read, but as Randolph points out, in those types of cases, you aren’t really reading the values anyhow.

Comments closed

XML and JSON Creation in SQL Server vs Elsewhere

Lukas Eder walks us through converting result sets to XML and JSON using different platforms:

SQL Server supports transforming flat tabular SQL result sets into hierarchical structures by convention using the convenient FOR XML or FOR JSON syntaxes. This is really convenient and less verbose than the standard SQL/XML or SQL/JSON APIs – although the standard ones are more powerful.

In this blog post, I’d like to show a few core features of the SQL Server syntax, and what they correspond to in standard SQL. jOOQ 3.14 will support both SQL Server’s syntax and the standard syntax, and will be able to translate from one to the other, such that you can use SQL Server syntax also on Db2, MariaDB, MySQL, Oracle, PostgreSQL. You can play around with the current state of development on our website here.

Click through for a series of examples.

Comments closed

WAITFOR TIME Instead of DELAY

Denis Gobo points out that there is a second option when using WAITFOR:

I was looking at some code I wrote the other day and noticed the WAITFOR command.. This got me thinking. How many times have I used WAITFOR in code, probably as much as I have used NTILE  🙂

I looked at the documentation for WAITFOR and notice there is TIME in addition to DELAY.  Oh that is handy, I always rolled my own ghetto-style version by calculating how long it would be until a specific time and then I would use that in the WAITFOR DELAY command

Read on to see why you might use WAITFOR and how to use each option. I use it a bit more frequently than Denis, though I still need to run a test each time to get the syntax right…

Comments closed