Press "Enter" to skip to content

Category: T-SQL

GOTO And Labels In T-SQL

Ryan Desmond demonstrates the purpose of GOTO in T-SQL:

So I was playing around at work today and decided for whatever reason to see how I could get the code I was writing to fire off only in certain situations.

If it’s Sunday maybe, or if this is in a particular environment, or if a record in an admin table was something specific.  I’m not sure how I’ll use this but I stumbled on Labels and decided to play with them.

Ok, so how to get to know labels.  Well, in order to get them to work sometimes I have to create labels that are based on some criteria.

I do try to avoid these as much as possible, but they are valid syntax and I’ve seen a couple of cases where it makes sense to use GOTO.

1 Comment

Window Functions Have Defaults, Too

Steve Jones reminds us that when running a window function, there is a default window in place:

What I want to do is compare the passing yards each year with the most current value for that player, showing the plus or minus. This means that for Aaron Rodgers, who threw for 1675 yards in 2017, I’d want to show this for the first few years of his career:

This shows me an easy view of the years where he was better in his career than he is now. Last year was likely a down year because of injury, but we’ll see this year.

In any case, if I run this query using LAST_VALUE() for the final year of his career, I don’t get the right results.

It’s good to keep in mind the full syntax for a window function for just this reason.

Comments closed

Shuffling Data And Zipping Results In T-SQL

Phil Factor continues his series on pseudonymization:

The problems come with uncommon values. If you are pseudonymizing a medical database that is required for research purposes on people with potentially embarrassing diseases, and it appears on the dark web, anyone with a rare or unusual surname or first-name comes up on the list, so the shuffle doesn’t help the privacy of Fortescue Ceresole, or whatever his name may be.

If you are spoofing data entirely, you don’t necessarily have this problem because your constructed value will have no relationship to the original value. If it comes from a list of common names or if you randomly create a name ‘Thomas’, it will have no relationship to the original names in the database as long as you did things correctly and shuffle the list. Although a Markov string can produce an identical name that is uncommon, it can be eliminated from the list by an outer join with the original data.

After you shuffle data, you ‘zip’ it. Zipping lists is something you come across in procedural programming, and Linq has a good example. A .net array has an order, and all you are doing is to join by the order of the element in the list. If you randomize that order, you get a shuffle.

Read on for an example using the AdventureWorks Person.Person table.

Comments closed

Last Observation Carried Forward In T-SQL

Pawan Khowal shows one example of implementing Last Observation Carried Forward in T-SQL:

A very close friend given this to me. In this puzzle you have to fill the price of SKU & Color Id for missing months. Note that SKU & Color Id should be considered as a business unit. So you have to set the previous value available to the missing month. Please check out the sample input and the expected output. In this solution I have not considered any performance considerations.

Included is one solution, though there are others.

Comments closed

Ways To Check For Non-Existence

Brent Ozar shows two methods for finding records missing associated child records:

You’re writing a query, and you wanna check to see if rows exist in a table.

I’m using the free Stack Overflow database, and I wanna find all of the users who have not left a comment. The tables involved are:

  • In dbo.Users, the Id field uniquely identifies a user.
  • In dbo.Comments, there’s a UserId field that links to who left the comment.

A quick way to write it is:

And this works fine. When you read the query, you might think SQL Server would run that SELECT * FROM dbo.Comments query for every single row of the Users table – but it’s way smarter than that, bucko. It scans the Comments index first because it’s much larger, and then joins that to the Users table.

But this isn’t the only way to query these tables, and Brent shows how to tell which method works better.

Comments closed

Grouping And Aggregating: Optimizing The Optimizer

Itzik Ben-Gan shows an example of how you can nudge the SQL Server optimizer to the right answer by rewriting a query:

As you can see, the groups are obtained by scanning the index on the groups table, and the aggregate is obtained by applying a seek in the index on the main table. The higher the density of the grouping set, the more optimal this plan is compared to the default strategy for the grouped query.

Just like we did earlier for the default scan strategy, let’s estimate the number of logical reads and plan cost for the seeks strategy. The estimated number of logical reads is the number of reads for the single execution of the Index Scan operator that retrieves the groups, plus the reads for all of the executions of the Index Seek operator.

The estimated number of logical reads for the Index Scan operator is negligible compared to the seeks; still, it’s CEILING(1e0 * @numgroups / @rowsperpage). Take Query 4 as an example; say the index idx_sid fits about 600 rows per leaf page (actual number depends on actual shipperid values since the datatype is VARCHAR(5)). With 5 groups, all rows fit in a single leaf page. If you had 5,000 groups, they would fit in 9 pages.

Plus some love for the APPLY operator.  Read the whole thing.

Comments closed

Serial Plans And Serial Zones

Daniel Hutmacher takes a look at what causes your T-SQL code to use a serial plan or at least go into a zone where parallelism is not an option:

Modifications to table variables

UPDATE, INSERT and DELETE on table variables cause a completely serial plan. SELECT statements, on the other hand, don’t neccessarily.

Scalar functions

Completely serial. Even when they’re used in computed columns in one of the tables. Even when you’re not referencing that actual column.

Not all computed columns generate serial plans – only those with scalar functions.

Read on for a number of other places.  It turns out that this set is pretty stable from 2012 through to 2017.

Comments closed

Propagating The Last Non-Null Value With T-SQL

Tomaz Kastrun shows a great use of window functions in T-SQL:

So you have NULL values in your SQL Server table and you want to populate those NULL values with the last non-NULL value, based on a particular order. Once you have only one NULL values encapsulated between two populated values, there are quick and fast solutions. But what if you find a larger gap of NULL values and you want to populate these values as well?

Click through for a partial solution, followed by the real solution.

Comments closed

Writing Long Strings To Output In SSMS

Bert Wagner shows us a few techniques for printing long strings in SSMS:

Erik Darling posts one solution to this problem in his T-SQL Tuesday #104 entry (as well as some other problems/solutions for lengthy SQL variables). Specifically he links to a SQL string printing script that will loop through the lengthy variable and print everything while maintaining formatting:

And while I like using that stored procedure on my primary server, I’m too lazy to install it every where I need it.

Instead, I have a couple of go-to solutions that work on all SQL Server instances 2008 forward.

The approach Bert outlines isn’t perfect, but it is definitely interesting and easier to write than the ones which work a bit better.

Comments closed

T-SQL Formatters

Kenneth Fisher links to a set of T-SQL formatters:

I have a hard time working with code that’s so badly formatted I can’t read it. Normally I’d just format it myself, but this was a good dozen pages long. So what did I do? I ran to twitter screaming for help.

And I got a lot of responses! Here were the recommendations followed by who recommended them:

My favorite is DevArt’s SQL Complete, which is a paid product, but covers almost perfectly our 30-some page T-SQL formatting guideline.

Comments closed