Press "Enter" to skip to content

Category: T-SQL

Recursion in T-SQL

Bert Wagner takes us through writing recursive statements in T-SQL:

Recursive queries are fun to plan and write. They can be frustrating too depending on the complexity of the problem you are trying to solve.

This post shows one solution for finding all records that are related, either directly or via intermediate records, using recursive queries in SQL Server.

When you know the data size will be fairly small and performance isn’t critical, recursion can be an elegant solution to a data access problem.

Comments closed

The Row Count Spool Operator

Hugo Kornelis dives into another operator:

The Row Count Spool operator is one of the four spool operators that SQL Server supports. It counts the number of rows in its input data, and can then later return that same amount of rows, without having to call its child operators to produce the input again.

The Row Count Spool can be viewed as similar to Table Spool, but optimized for cases where the amount of rows is relevant but their content is not. Because the content of the rows is not relevant, the operator does not need to use tempdb to store its input in a worktable; it only has to keep a running count as it reads the input. The other two spool operators have different use cases: Index Spool is used to enable the spool operator to return specific subsets of the input multiple times, and the Window Spool operator is used to support the ROWS and RANGE specifications of windowing functions.

Read on to see where this might be useful and when it may appear.

Comments closed

The Logic of Common Tale Expressions

Itzik Ben-Gan dives into common table expressions:

The source of the term aside, common table expression, or CTE, is the commonly used term by T-SQL practitioners for the structure that is the focus of this article. So first, let’s address whether it is an appropriate term. We already concluded that the term table expression is appropriate for an expression that conceptually returns a table. Derived tables, CTEs, views and inline table valued functions are all types of named table expressions that T-SQL supports. So, the table expression part of common table expression certainly seems appropriate. As for the common part of the term, it probably has to do with one of the design advantages of CTEs over derived tables. Remember that you cannot reuse the derived table name (or more accurately the range variable name) more than once in the outer query. Conversely, the CTE name can be used multiple times in the outer query. In other words, the CTE name is common to the outer query. Of course, I’ll demonstrate this design aspect in this article.

CTEs give you similar benefits to derived tables, including enabling the development of modular solutions, reusing column aliases, indirectly interacting with window functions in clauses that don’t normally allow them, supporting modifications that indirectly rely on TOP or OFFSET FETCH with order specification, and others. But there are certain design advantages compared to derived tables, which I’ll cover in detail after I provide the syntax for the structure.

Click through for a lot of great detail. On the question of derived tables versus common table expressions, my mental taxonomy is basically APPLY > CTE > derived table, but that’s in a context-free discussion. In practice, all three are useful and the best question to answer is “Which thing helps future developers understand best my intent?”

Comments closed

TOP and Ordering

Erik Darling is in the middle of a back-to-basics series on performance tuning:

And you see, once you set up a query to return the TOP N rows, there’s an expectation that users get to choose the order they start seeing rows in. As long as we stick to columns whose ordering is supported by an index, things will be pretty stable.

Once we go outside that, a TOP can be rough on a query.

Read on for an example of what happens when that type of thing goes wrong.

Comments closed

SQL Serverless in Azure Synapse Analytics

James Serra talks to us about SQL serverless (presently known as SQL on-demand but I’m getting ahead of the marketing curve this time):

Querying data in ADLS Gen2 storage using T-SQL is made easy because of the OPENROWSET function with additional capabilities (check out the T-SQL that is supported). The currently supported file types in ADLS Gen2 that SQL-on-demand can use are Parquet, CSV, and JSON. ParquetDirect and CSV 2.0 add performance improvements (see Benchmarking Azure Synapse Analytics – SQL Serverless, using .NET Interactive). You can also query folders and multiple files and use file metadata in queries.

Read on to learn a lot more about its use cases.

Comments closed

The Performance Hit of Disabling the Identity Cache

Tibor Karaszi explains why you probably want to keep identity caching on:

Should you care about the gap? In most cases: no. The identity value should be meaningless. In many cases I think that it is just an aesthetic issue to not have these gaps. (I’ve seen cases where you do run into problems because of the gap, I should add – but not frequently.)

For the SEQUENCE object, we have the CACHE option to specify how many values to cache. I.e., max values we can jump if we have a hard shutdown.

For identity, we have the IDENTITY CACHE database scoped configuration, introduced in SQL Server 2017. Caching on or off. On is default. We also have trace flag 272, at the instance level.

However, disabling the caching isn’t free. 

In an ideal world, there are zero cases where you care about the gap. Identity integers and sequences are surrogate keys, and “surrogate” here means that it has no inherent business value—otherwise it’d be a natural key. Subsequently ascribing value to it is folly, and if you are in a scenario in which you need guaranteed sequences which always increase by exactly 1 and never have gaps (think something like check numbers or invoice numbers, things which accountants really want to see in a fixed order), identity integers and sequences aren’t the right tools for you.

But read on to see how much faster caching of identity values can make insert performance.

Comments closed

Finding Distance Between Cities using SQL Server

Hasan Savran wants to find geographic distances with SQL Server:

I wrote about finding distance between two location in my older post. I have been getting question about how to make the search by using data in SQL Server. In this post, I will try to answer all these questions. You do not need to have latest version of SQL Server to do any of these examples. SQL Server has been supporting Geospatial data since 2008.
     First, we need some data. Not just some data, some free spatial data. I want to show you how to find distance between cities in this post, so I need at least names of the cities and their latitude and longitude. I downloaded this data from SimpleMaps website in CSV Format.

Read on for the solution.

Comments closed

Simulating DATESERIAL in SQL Server

Madhivanan has nearly a dozen methods to replicate the functionality of the old DateSerial function in VB:

DateSerial function accepts three paramter values year,month and day and return a valid date value with time set to midnight. SQL Server does not support this function until version 2012 is released.

There can be many ways to simulate this functionality. Here are 10 different ways

I’d be pretty upset if some of those were in my code base, but this shows that there are plenty of ways to get to the same result. Just, uh, don’t use all of them.

Comments closed

Bug with Halloween Protection and the OUTPUT Clause

Paul White writes up a bug report:

Looking at the execution plan, it is hard to see how deleting a row (at the Clustered Index Delete) then inserting it again (at the Clustered Index Insert) could possibly result in a duplicate key in the index. Remember there is only one row, one column, and one index.

Logically, the only way this error can occur is if the Delete operator does not delete the row.

Read the whole thing. It’s probably not something you’ll ever come across yourself, hopefully.

Comments closed