Press "Enter" to skip to content

Category: Syntax

Joining to STRING_SPLIT

Kevin Wilkie explains that the STRING_SPLIT() function isn’t something one simply joins to:

My friends! Last time together, we discussed using the STRING_SPLIT function and how it’s used in combination with the CROSS APPLY.

First off, most of us are used to working with an INNER JOIN instead of CROSS APPLY. Well, you’re not going to be able to use an INNER JOIN when you’re using the STRING_SPLIT function.

Read on for a demonstration.

Comments closed

Row Pattern Recognition in Snowflake

Koen Verbeeck knows how to make my blood boil:

I’m doing a little series on some of the nice features/capabilities in Snowflake (the cloud data warehouse). In each part, I’ll highlight something that I think it’s interesting enough to share. It might be some SQL function that I’d really like to be in SQL Server, it might be something else.

In the book T-SQL Window Functions – For data analysis and beyond, Itzik Ben-Gan explains the concept of row-pattern recognition (RPR) in a dedicated chapter (you can find a full book review here). It’s a concept that doesn’t exist in T-SQL, but is described in the SQL standard and is available in some other database systems. Snowflake has recently introduced support for RPR. 

Jokes about being angry aside, I’d really like to see row pattern recognition in SQL Server. It’s definitely not trivial to learn, but once you do, there’s a lot of power available to you. Koen also links to the Feedback item about this, so vote on that as well.

Comments closed

DATE_BUCKET() and an Alternative

Itzik Ben-Gan takes us through the DATE_BUCKET() function:

Bucketizing date and time data involves organizing data in groups representing fixed intervals of time for analytical purposes. Often the input is time series data stored in a table where the rows represent measurements taken at regular time intervals. For example, the measurements could be temperature and humidity readings taken every 5 minutes, and you want to group the data using hourly buckets and compute aggregates like average per hour. Even though time series data is a common source for bucket-based analysis, the concept is just as relevant to any data that involves date and time attributes and associated measures. For example, you might want to organize sales data in fiscal year buckets and compute aggregates like the total sales value per fiscal year. In this article, I cover two methods for bucketizing date and time data. One is using a function called DATE_BUCKET, which at the time of writing is only available in Azure SQL Edge. Another is using a custom calculation that emulates the DATE_BUCKET function, which you can use in any version, edition, and flavor of SQL Server and Azure SQL Database.

DATE_BUCKET() is something I’d like to see in the next version of SQL Server on-premises. There are some peculiarities to how it works and behavior isn’t always exactly what you’d expect, but it does accomplish what it sets out to do.

Comments closed

Finding Procedures Using SELECT *

Michael J. Swart hunts for the real performance killer:

I have trouble with procedures that use SELECT *. They are often not “Blue-Green safe“. In other words, if a procedure has a query that uses SELECT * then I can’t change the underlying tables can’t change without causing some tricky deployment issues. (The same is not true for ad hoc queries from the application).

I also have a lot of procedures to look at (about 5000) and I’d like to find the procedures that use SELECT *.
I want to maybe ignore SELECT * when selecting from a subquery with a well-defined column list.
I also want to maybe include related queries like OUTPUT inserted.*.

Read on to see Michael’s strategy for attacking the problem while not including benign instances of it (such as WHERE EXISTS (SELECT * ...), which won’t cause any issues because the database engine doesn’t expand that wildcard).

Comments closed

Ways to avoid the MERGE Operator

Michael J. Swart has important bullet points:

Aaron Bertrand has a post called Use Caution with SQL Server’s MERGE Statement. It’s a pretty thorough compilation of all the problems and defects associated with the MERGE statement that folks have reported in the past. But it’s been a few years since that post and in the spirit of giving Microsoft the benefit of the doubt, I revisited each of the issues Aaron brought up.

Some of the items can be dismissed based on circumstances. I noticed that:

– Some of the issues are fixed in recent versions (2016+).

– Some of the issues that have been marked as won’t fix have been fixed anyway (the repro script associated with the issue no longer fails).

– Some of the items are complaints about confusing documentation.

– Some are complaints are about issues that are not limited to the MERGE statement (e.g. concurrency and constraint checks).

Spoilers: some + some + some + some is still a lot less than all. Read the whole thing.

Comments closed

Analytic Window Functions in SQL Server

Aveek Das takes a look at a few analytic window functions available since SQL Server 2012:

Since the introduction of SQL Server 2012, the analytic functions were added to the SQL Server database engine. Any version of SQL Server after SQL Server 2012 can execute analytic queries on it. These functions are used to calculate an aggregated value from the dataset but are based on a specific set of rows instead of the entire dataset. As compared to aggregate functions like SUM, COUNT, AVG, etc. which return scalar records, these functions can return multiple records based on the conditions. The most common examples of using these functions are to find moving averages, running totals, etc. SQL Server supports the following analytic functions.

1. CUME_DIST – Find the cumulative distribution of a numerical column

2. FIRST_VALUE – Finds the first value of a column from the group and prints the same for each row

3. LAST_VALUE – Finds the last value of a column from the group and prints the same for each row

4. LAG – Reads values after the specified number of rows for a column

5. LEAD – Reads values before the specified number of rows for a column

Click through for examples of how each works.

Comments closed

Identifying Troublesome NOLOCK Statements

Aaron Bertrand is on a mission:

I’ve warned before about the possible downsides of both NOLOCK in general and, more specifically, when used against the target of an update or delete. While Microsoft claims that corruption errors due to the latter have been fixed in cumulative updates (e.g. see KB #2878968), we’re still seeing an occasional related issue where SQL Server will terminate, producing a stack dump that indicates a DML statement with NOLOCK as the cause. How do I find and correct all these potentially problematic statements?

The contrarian in me says, “You’re using NOLOCK; they’re all trouble.” But Aaron is a lot nicer about it than I am here.

Comments closed

Throwing Exceptions in T-SQL

Chad Callihan plays hot potato:

When it comes to error handling or troubleshooting a long stored procedure, RAISERROR is an easy statement to use that gets the job done. Way back in SQL Server 2012, Microsoft wanted to replace RAISERROR with the new (arguably less convenient) THROW statement. I thought it would be worth looking at an example using THROW and what it takes to have custom messaging with a parameter.

THROW can be a little less convenient, but conceptually speaking, I do think it’s better than the alternative. The part which makes it tricky in practice is that so many types of T-SQL errors are non-catchable, so as a developer, you have to keep on your toes about it.

Comments closed

LAG() in SQL Server

Chad Callihan shows off one of the best window functions:

The LAG function in SQL Server allows you to work with a row of data as well as the previous row of data in a data set. When would that ever be useful? If you’re a sports fan, you’re familiar with this concept whether you realize it or not. Let’s look at an example.

LAG() is outstanding for business reports, such as if you want three-month trailing data.

Comments closed

GROUP BY ROLLUP

Dinesh Asanka hits on one of the under-utilized grouping operators:

You will see that data is aggregated for the columns provided by the GROUP BY clause. Important to note that the data will not be ordered in the GROUP BY columns and you need to explicitly order them by using the ORDER BY clause as shown in the above query.

In the above query, if you wish to find the total for Australia only, you need to run another GROUP BY with EnglishCountryRegionName and perform a UNION ALL. This will be a very ugly method. By using GROUP BY ROLLUP you can achieve the above-said task as shown in the following query.

If I were to rank grouping operators by how frequently I use them, it’s GROUPING SETS by a country mile, then ROLLUP, and almost never do I use CUBE.

Comments closed