Press "Enter" to skip to content

Category: T-SQL

Grouping And Aggregating: Optimizing The Optimizer

Itzik Ben-Gan shows an example of how you can nudge the SQL Server optimizer to the right answer by rewriting a query:

As you can see, the groups are obtained by scanning the index on the groups table, and the aggregate is obtained by applying a seek in the index on the main table. The higher the density of the grouping set, the more optimal this plan is compared to the default strategy for the grouped query.

Just like we did earlier for the default scan strategy, let’s estimate the number of logical reads and plan cost for the seeks strategy. The estimated number of logical reads is the number of reads for the single execution of the Index Scan operator that retrieves the groups, plus the reads for all of the executions of the Index Seek operator.

The estimated number of logical reads for the Index Scan operator is negligible compared to the seeks; still, it’s CEILING(1e0 * @numgroups / @rowsperpage). Take Query 4 as an example; say the index idx_sid fits about 600 rows per leaf page (actual number depends on actual shipperid values since the datatype is VARCHAR(5)). With 5 groups, all rows fit in a single leaf page. If you had 5,000 groups, they would fit in 9 pages.

Plus some love for the APPLY operator.  Read the whole thing.

Comments closed

Serial Plans And Serial Zones

Daniel Hutmacher takes a look at what causes your T-SQL code to use a serial plan or at least go into a zone where parallelism is not an option:

Modifications to table variables

UPDATE, INSERT and DELETE on table variables cause a completely serial plan. SELECT statements, on the other hand, don’t neccessarily.

Scalar functions

Completely serial. Even when they’re used in computed columns in one of the tables. Even when you’re not referencing that actual column.

Not all computed columns generate serial plans – only those with scalar functions.

Read on for a number of other places.  It turns out that this set is pretty stable from 2012 through to 2017.

Comments closed

Propagating The Last Non-Null Value With T-SQL

Tomaz Kastrun shows a great use of window functions in T-SQL:

So you have NULL values in your SQL Server table and you want to populate those NULL values with the last non-NULL value, based on a particular order. Once you have only one NULL values encapsulated between two populated values, there are quick and fast solutions. But what if you find a larger gap of NULL values and you want to populate these values as well?

Click through for a partial solution, followed by the real solution.

Comments closed

Writing Long Strings To Output In SSMS

Bert Wagner shows us a few techniques for printing long strings in SSMS:

Erik Darling posts one solution to this problem in his T-SQL Tuesday #104 entry (as well as some other problems/solutions for lengthy SQL variables). Specifically he links to a SQL string printing script that will loop through the lengthy variable and print everything while maintaining formatting:

And while I like using that stored procedure on my primary server, I’m too lazy to install it every where I need it.

Instead, I have a couple of go-to solutions that work on all SQL Server instances 2008 forward.

The approach Bert outlines isn’t perfect, but it is definitely interesting and easier to write than the ones which work a bit better.

Comments closed

T-SQL Formatters

Kenneth Fisher links to a set of T-SQL formatters:

I have a hard time working with code that’s so badly formatted I can’t read it. Normally I’d just format it myself, but this was a good dozen pages long. So what did I do? I ran to twitter screaming for help.

And I got a lot of responses! Here were the recommendations followed by who recommended them:

My favorite is DevArt’s SQL Complete, which is a paid product, but covers almost perfectly our 30-some page T-SQL formatting guideline.

Comments closed

Pivoting And Unpivoting Data In T-SQL

Jeanne Combrinck shows how to use the PIVOT and UNPIVOT operators in SQL Server:

One thing that I still get confused about writing is pivot queries. I find myself needing to lookup the syntax every time. Basically you use Pivot and Unpivot to change the output of a table. If you would like rows turned into columns you can use pivot and for the opposite you can use unpivot.

One thing to note is the column identifiers in the unpivot clause follow the catalog collation. For SQL Database, the collation is always SQL_Latin_General_CP1_CI_AS. For SQL Server partially contained databases, the collation is always Latin1_General_100_CI_AS_KS_WS_SC. If the column is combined with other columns, then a collate clause (COLLATE DATABASE_DEFAULT) is required to avoid conflicts.

Click through for an example of each.

Comments closed

The Value Of CROSS APPLY

Kendra Little talks about one of my favorite T-SQL operators:

Here’s my top 3 favorite uses for CROSS APPLY and OUTER APPLY:

  1. APPLY is fantastic for calling table valued functions. I didn’t include questions about those in the quiz, simply for the purposes of keeping the code simple, and because I wanted the quiz to be about thinking through how apply works — but it’s still the #1 use.

  2. Another thing I might use it for is when a query needs a correlated subquery — somewhat like an inline function.

  3. And I also like it for queries that have a calculation that needs to be done and which is referenced in multiple columns in the query, or perhaps also in a predicate and the select. You can perform the computation once in the apply and then reference it multiple times. That way if you have to change the formula later on, you only have to change it in once place, plus I find it’s easier to read in some cases.

Item number three is something I learned from an older blog post Kendra wrote and regularly use APPLY for this purpose.

Comments closed

Getting A Random Row From A Small Data Set

David Fowler has a quick script to get a random row from a relatively small table:

A couple of times recently I’ve seen the question asked, ‘How can I select a single row at random from a table?’.

There are often a few ways of doing this suggested, most seem to rely using CTEs or temp tables.  I thought I’d share, in a quick post a very simple and easy way of doing it that I’ve used a couple of times.

The script David provides requires a table scan, so it doesn’t scale out very well.  But depending on your hardware, that can still be pretty efficient into the tens of thousands of rows.

Comments closed

Obfuscating Continuous Variables

Phil Factor continues his series on data obfuscation:

Imagine that you have a table giving invoice values. You will want your spoof data to conform with the same ups and downs of the real data over time. You may be able to get the overall distribution the same as the real data, but the resulting data would be useless for seeing the effect of last years sales promotion. The invoice values will depend on your sales promotions if your marketing people have done their job properly.

By making your data the same distribution as your production data, you don’t necessarily get the same strategy chosen by the query analyser, but you dramatically increase the chances of getting it. SQL Server uses a complex paradigm to select amongst its alternative plans for a query. It maintains distribution statistics for every column and index that is used for selecting rows. These aren’t actually histograms in the classic sense, but they perform a similar function and are used by the SQL Server engine to predict the number of rows that will be returned.

The focus is on independent variables, though there is a little bit at the end about working with dependencies.

Comments closed

Creating An Inline Table-Valued Function In SQL Server

Jeanne Combrinck looks at inline table-valued functions in SQL Server:

Lets start off with what is a table-valued function (TVF)? A TVF is a dynamic table produced at the time of execution, depending on parameters. Like a view, a TVF creates a result set only when it’s executed, but, unlike a view, it can be parameterized.

You get two types of TVFs, an Inline Table-Valued Function (ITVFs) and Multi-statement Table-Valued Function (MTVFs). I find them easy to remember, think of the “I” in ITVF as 1 (single statement) and the “M” in MTVF as “many” (multiple statements).

As you can imagine, a TVF produces a result set that can be used as a virtual table or view. Yes, you can actually select data from a TVF, or join it with some other tables, views, or even other TVFs. In this post I will go into more detail on ITVFs.

In my experience, the best part about using inline table-valued function is the 3-9x performance improvement you get when removing them and inlining their code.  It’s a great idea but the performance cost is just too high for me.

Comments closed