Press "Enter" to skip to content

Category: T-SQL

I Remember Halloween

Jared Poche experiences Halloween problem protection:

Simple enough so far. The scan is against a memory optimized table variable, and the filter to the left our our seeks and scans check for a change to our value. Nothing left but to update the index and…

CURVE BALL

Wait, what’s all this? We have a Split operator after our Clustered Index Update. SQL Server does sometime turn an UPDATE statement into effectively a DELETE and INSERT if the row needs to move, but this seems a bit much. We have a total of 4 index update/delete operators now, and they aren’t cheap.

My very simple addition to the WHERE clause actually caused a small increase in duration, and a big jump in CPU. So what’s going on?

Read on to see the cause and what Jared was able to do about it.

Comments closed

NVARCHAR Everywhere

I get to put on my contrarian hat:

In the last episode of Shop Talk, I laid out an opinion which was…not well received. So I wanted to take some time and walk through my thinking a little more cogently than I was able to do during Shop Talk.

Here’s the short version. When you create a table and need a string column, you have a couple options available: VARCHAR and NVARCHAR. Let’s say that you’re a developer creating a table to store this string data. Do you choose VARCHAR or NVARCHAR? The classic answer is, “It depends.” And so I talk about why that is in video format right below these words.

I have a video which goes into detail, plus a bunch of words. Plus mice and banjos. 🐭🪕

Comments closed

Uses of ROLLUP and CUBE

Greg Dodd wraps up a series on ROLLUP and CUBE:

I don’t think ROLLUP has a great use in the real world (feel free to comment below if you have a real use for it). The best I can see is that you have SQL Server emailing the results of a query to someone and they want subtotals. Adding totals can be done by any reporting app, probably more efficiently than SQL will do it, and it’s probably easier to debug there as well.

I do, however, think CUBE has an amazing purpose.

The funny thing is, I disagree and find ROLLUP a lot more useful than CUBE because ROLLUP gives me aggregations along a hierarchy. If I set up my GROUP BY clause correctly, I can see the results of my Product Category –> Product Subcategory –> Product hierarchy quite easily. Yeah, I can also do this in reporting, but if I just need a quick printout or need to do something special with the subtotals, ROLLUP is great. I definitely use GROUPING SETS more than ROLLUP and CUBE combined, but I’ve had production code useROLLUP and, to the best of my recollection, never have I created a procedure or report which is based off of CUBE.

Comments closed

An Overview of the T-SQL Script DOM

Dan Guzman provides a public service:

Scripts are parsed by invoking the Parse method of T-SQL script DOM library TSqlParser class. The parser understands the complex T-SQL abstract syntax tree and splits T-SQL source into atomic TSqlParserTokens of TSqlTokenTypes that represent keywords, identifiers, punctuation, literals, whitespace, etc. These low-level tokens are grouped into more meaningful TSqlFragment objects that represent language elements of the script DOM, such as batches, statements, clauses, etc. Fragments, rather than the low-level parser tokens, are most often used in practice, although the underlying tokens are available for specialized requirements

The Parse method returns a TSqlFragment object of type TSqlScript containing all fragments within the script. This top-level fragment of the DOM hierarchy provides programmatic access to all language element fragments in the script. Nearly 1,000 different fragment types exist today due to the many granular T-SQL language elements.

Dan provides several examples of how to use the script DOM, making this a must-read if you’re interested in writing code around SQL Server.

Comments closed

The Rollup and Cube Operators

Greg Dodd digs into the ROLLUP and CUBE operators in a two-parter. First, ROLLUP:

As you can see, we now have these null’s popping up, but with totals. Row 5 for example, tells us that in 2017 there were 1,427,461 people living in Hawaii. Row 11 tells us that there are 2,438,188 people living in Rhode Island and Hawaii in 2017. Row 22 tells us that there were 2,429,070 people living in Rhode Island and Hawaii in 2018, and finally row 23 tells us that in total there have been 4,867,268 people in 2017 and 2018. This last row is a bit useless for this data as the overlap of those people would be huge, but for something like sales data, this number could be useful.

Next, CUBE:

For those with a keen eye you’ll see that I’ve started at row 28 in that screenshot. When we run the GROUP BY without ROLLUP or CUBE we get just 16 rows. With ROLLUP that grows to 23, but with CUBE it explodes out to 57. Why?

I’ve used ROLLUP several times with proper hierarchical data (e.g., product category, product sub-category, product) and it does an excellent job of summarizing that sort of data. CUBE has always returned too many rows for my liking. But the operator I go to most frequently is GROUPING SETS, as then I get to control the levels.

Comments closed

Updated SQL Server Diagnostic Queries

Glenn Berry has an updated set of DMV queries for us:

These are my SQL Server Diagnostic Information Queries for June 2020, aka my DMV Diagnostic Queries. They allow you to get a very comprehensive view of the configuration and performance of your SQL Server instance in a short amount of time. There are separate versions of these T-SQL queries for SQL Server 2005 through SQL Server 2019. I also have separate versions for SQL Managed Instance and Azure SQL Database. My diagnostic queries have been used by many people around the world since 2009. I make regular improvements to these queries each month.

This is one of my favorite methods for learning about a new SQL Server instance.

Comments closed

Notes on Derived Tables

Itzik Ben-Gan continues a series on table expressions:

The term derived table is used in SQL and T-SQL with more than one meaning. So first I want to make it clear which one I’m referring to in this article. I’m referring to a specific language construct that you define typically, but not only, in the FROM clause of an outer query. I’ll provide the syntax for this construct shortly.

The more general use of the term derived table in SQL is the counterpart to a derived relation from relational theory. A derived relation is a result relation that is derived from one or more input base relations, by applying relational operators from relational algebra like projection, intersection and others to those base relations. Similarly, in the general sense, a derived table in SQL is a result table that is derived from one or more base tables, by evaluating expressions against those input base tables.

There’s a lot to digest in this post, so check it out.

Comments closed

Building Queues in the Database

Erik Darling has created a series on queue tables in the database. Part one builds out a table:

A little explanation of the table: this is good for a reusable, ordered queue, that’ll give us some feedback on how long things took. It could also be used is there were a pending element, but we’d probably wanna change the indexing so that we could find either the last start time, or the last end time efficiently.

The thing that probably needs the most explanation here is the indexing and constraints. Because I know you, and you’re staring at the way I have my primary key, and you’re getting an itch and a twitch. That’s okay, it’s you’ve been conditioned for years to put the most selective column first. Unfortunately, that won’t work here.

Part two takes us through querying the queue:

For the table, our main priorities were indexing to make sure we can find work easily, and not allowing duplicate items.

For our worker proc, the main goals are going to be

– Looping until we run out of work
– Finding and reserving work with minimal locking
– Making that process atomic

Thankfully, this is a lot easier than it sounds. The table design does a lot of the work for us.

Erik’s design is not one I typically reach for, though my constraints are a bit different from his—typically, I’m using queue tables to run on periodic schedules and grab batches of records which finish processing before the next timed batch begins, and processes are idempotent, so if a queued item re-runs on occasion, it’s okay. But this is a really good technique if you need a more robust solution.

Comments closed

Never Forget a WHERE Clause Again

If you frequently forget to include the WHERE clause when you run queries, Rob Farley has an answer for you:

You meant to only update a few rows, but you weren’t paying attention to what you’d selected before hitting the Execute button. The WHERE clause got missed from what you executed, and you’ve updated every row. Oops.

Now, I totally hear you that you could’ve put this in a transaction, checked your results and then only done a COMMIT when you were sure. You could even have put that COMMIT inside an IF block that checked @@ROWCOUNT… yes, yes, yes. You could have used an SSMS add-in to let you execute the whole query that you’re currently on. You could’ve used a different tool entirely, such as Azure Data Studio. There are plenty of ways to avoid this. But I’m going to show you something that I find works for me, especially when I’m giving the script to someone else to run.

It’s more effort and makes the code harder to read, but if you have a real concern about somebody (and that somebody can include you) goofing this up, it does the job.

Comments closed