Press "Enter" to skip to content

Category: T-SQL

TOP and Ordering

Erik Darling is in the middle of a back-to-basics series on performance tuning:

And you see, once you set up a query to return the TOP N rows, there’s an expectation that users get to choose the order they start seeing rows in. As long as we stick to columns whose ordering is supported by an index, things will be pretty stable.

Once we go outside that, a TOP can be rough on a query.

Read on for an example of what happens when that type of thing goes wrong.

Comments closed

SQL Serverless in Azure Synapse Analytics

James Serra talks to us about SQL serverless (presently known as SQL on-demand but I’m getting ahead of the marketing curve this time):

Querying data in ADLS Gen2 storage using T-SQL is made easy because of the OPENROWSET function with additional capabilities (check out the T-SQL that is supported). The currently supported file types in ADLS Gen2 that SQL-on-demand can use are Parquet, CSV, and JSON. ParquetDirect and CSV 2.0 add performance improvements (see Benchmarking Azure Synapse Analytics – SQL Serverless, using .NET Interactive). You can also query folders and multiple files and use file metadata in queries.

Read on to learn a lot more about its use cases.

Comments closed

The Performance Hit of Disabling the Identity Cache

Tibor Karaszi explains why you probably want to keep identity caching on:

Should you care about the gap? In most cases: no. The identity value should be meaningless. In many cases I think that it is just an aesthetic issue to not have these gaps. (I’ve seen cases where you do run into problems because of the gap, I should add – but not frequently.)

For the SEQUENCE object, we have the CACHE option to specify how many values to cache. I.e., max values we can jump if we have a hard shutdown.

For identity, we have the IDENTITY CACHE database scoped configuration, introduced in SQL Server 2017. Caching on or off. On is default. We also have trace flag 272, at the instance level.

However, disabling the caching isn’t free. 

In an ideal world, there are zero cases where you care about the gap. Identity integers and sequences are surrogate keys, and “surrogate” here means that it has no inherent business value—otherwise it’d be a natural key. Subsequently ascribing value to it is folly, and if you are in a scenario in which you need guaranteed sequences which always increase by exactly 1 and never have gaps (think something like check numbers or invoice numbers, things which accountants really want to see in a fixed order), identity integers and sequences aren’t the right tools for you.

But read on to see how much faster caching of identity values can make insert performance.

Comments closed

Finding Distance Between Cities using SQL Server

Hasan Savran wants to find geographic distances with SQL Server:

I wrote about finding distance between two location in my older post. I have been getting question about how to make the search by using data in SQL Server. In this post, I will try to answer all these questions. You do not need to have latest version of SQL Server to do any of these examples. SQL Server has been supporting Geospatial data since 2008.
     First, we need some data. Not just some data, some free spatial data. I want to show you how to find distance between cities in this post, so I need at least names of the cities and their latitude and longitude. I downloaded this data from SimpleMaps website in CSV Format.

Read on for the solution.

Comments closed

Simulating DATESERIAL in SQL Server

Madhivanan has nearly a dozen methods to replicate the functionality of the old DateSerial function in VB:

DateSerial function accepts three paramter values year,month and day and return a valid date value with time set to midnight. SQL Server does not support this function until version 2012 is released.

There can be many ways to simulate this functionality. Here are 10 different ways

I’d be pretty upset if some of those were in my code base, but this shows that there are plenty of ways to get to the same result. Just, uh, don’t use all of them.

Comments closed

Bug with Halloween Protection and the OUTPUT Clause

Paul White writes up a bug report:

Looking at the execution plan, it is hard to see how deleting a row (at the Clustered Index Delete) then inserting it again (at the Clustered Index Insert) could possibly result in a duplicate key in the index. Remember there is only one row, one column, and one index.

Logically, the only way this error can occur is if the Delete operator does not delete the row.

Read the whole thing. It’s probably not something you’ll ever come across yourself, hopefully.

Comments closed

Why IS NOT NULL is a Range Predicate

Erik Darling drops knowledge on us:

Why is IS NULL (not to be confused with ISNULL, the function) considered in equality predicate, and IS NOT NULL considered an inequality (or range) predicate?

It seems like they should be fairly equivalent, though opposite. One tests for a lack of values, and one tests for the presence of values, with no further examination of what those values are.

The trickier thing is that we can seek to either condition, but what happens next WILL SHOCK YOU.

This is my shocked face.

Comments closed

Splitting Data with T-SQL

Chris Hyde shows a few techniques for splitting out data into training, testing, and validation sets:

We see right away that this method failed horribly as all of the data was placed into the same dataset. This holds true no matter how many times we execute the code, and it happens because the RAND() function is only evaluated once for the whole query, and not individually for each row. To correct this we’ll instead use a method that Jeff Moden taught me at a SQL Saturday in Detroit several years ago – generating a NEWID() for each row, using the CHECKSUM() function to turn it into a random number, and then the % (modulus) function to turn it into a number between 0 and 99 inclusive.

I’d have to test it out, but I’d think you could modify method 3 to include a CROSS APPLY to perform one ABS(CHECKSUM(NEWID()) and get exact counts that way without a temp table.

Comments closed

The Limits of LEN (or REPLICATE)

Pamela Mooney takes us through a quandry:

I was using LEN() to troubleshoot an issue I was having with a dynamically constructed string truncating while inserting into an NVARCHAR(MAX) column.  Since I know that NVARCHAR(MAX) has a 2 GB limit (goodness only knows how many characters that is!),  I couldn’t explain the truncation.  A colleague suggested doing a test with another dynamically constructed string.  Maybe then, I could find where the cutoff was occurring.

Great idea!

So, I came up with a plan.

Click through for the plan, but be sure to read Pamela’s comment at the bottom as there’s a bit more to the story.

Comments closed