Press "Enter" to skip to content

Category: T-SQL

Using the FIRST_VALUE() Window Function

Dave Mason explains what FIRST_VALUE() does:

Last week, I found myself with a T-SQL problem. A customer has a database where multiple “widget” records might exist in a table. End users might enter the duplicate rows, or they might be inserted by other automated processes. My task was to keep the most recent widget record. But, if there were any NULL values, look in the previous “duplicate” rows and use the most recent missing value for each column.

Click through to see it in action. And following up from Dave’s last point, IGNORE NULLS did originate in Azure SQL Edge, though hopefully we see it in the next version of on-premises SQL Server.

Comments closed

Unique Constraints vs Unique Indexes

Erik Darling calls out unique key constraints:

I do love appropriately applied uniqueness. It can be helpful not just for keeping bad data out, but also help the optimizer reason about how many rows might qualify when you join or filter on that data.

The thing is, I disagree a little bit with how most people set them up, which is by creating a unique constraint.

Data modeling Kevin wants to use unique key constraints because that’s the correct thing to do. Implementation Kevin uses unique nonclustered indexes for the reasons Erik describes. Not mentioned in Erik’s post but potentially relevant is that operations on unique nonclustered indexes can be done online, whereas unique key constraint operations (creation and alteration via drop+create) are offline.

Comments closed

Using Query Labels in Azure Synapse Analytics

Gauri Mahajan shows one of the pieces of functionality in Azure Synapse Analytics dedicated SQL pools that I’d like to see on-premises:

Azure Synapse supports a concept known as “query labels” that allows tagging any DDL or DML queries that are executed on the dedicated SQL pool. These labels can be queried using the dynamic management views (DMVs). One can use these labels to describe the purpose of the query or add any metadata to the query being executed and the same can be used later for instrumenting the queries, specifically to identify the queries that meet the desired search criteria. Let’s walk through a step-by-step exercise to understand this concept practically.

Click through for the process.

Comments closed

Concatenating in SQL Server

Lee Markup takes us through a pair of very useful functions in SQL Server:

SQL Server concatenation methods have been enhanced in modern versions of SQL Server. SQL Server 2012 introduced the CONCAT() function. In SQL Server 2017 we get CONCAT_WS().

A common usage of concatenation, or joining column values together in a string, is combining a FirstName and LastName column into a FullName column.  Another common usage might be for creating an address column that pulls together building number, street, city and zip code.

Read on to learn more. CONCAT() and CONCAT_WS() are also extremely helpful for change detection in ETL processes. For example, you might have a queue table to process and only want to update records in which relevant source fields changed, ignoring the ones which don’t exist in your destination. A combination of HASHBYTES() and CONCAT_WS() will do the trick quite nicely.

Comments closed

Deleting Duplicate Records

Chad Callihan shows one of the best ways to remove duplicate records from a table:

Have you ever needed to delete duplicate records from a table? If there is only one duplicate in a table then simply running something like DELETE FROM Table WHERE ID = @DupRecord will do the trick. What happens when the problem is found after multiple records are duplicated? Will tracking them all down be more time consuming? Here are a few different options for quickly clearing out duplicate records.

There’s the best way, and then there are the other ways.

Comments closed

Exchange Spill Wait Stats

Erik Darling looks at exchange spills:

There are quite high waits on PAGEIOLATCH_EX, SLEEP_TASK, and SLEEP_BPOOL_STEAL rounding out the top five. This is quite interesting, because I’ve never explicitly thought of PAGEIOLATCH_EX waits in the context of exchange spills. Normally, I think of them when queries read pages from disk into memory for modification.

Going down the line, SLEEP_TASK is familiar from our time spent with hash spills, but SLEEP_BPOOL_STEAL is so far undocumented anywhere.

Erik also does the math on this query and recommends that you not write a query like this one.

Comments closed

Avoiding WHILE 1=1 Loops

Aaron Bertrand does not believe in the power of the infinite loop:

A short time ago a colleague had an issue with a Microsoft SQL Server stored procedure. They were using our recommended approach for batching updates, but there was a small problem with their code that led to the procedure “running forever.” I think we’ve all made a mistake like this at one point or another; here’s how I try to avoid the situation altogether.

The argument isn’t “don’t use WHILE loops” or “don’t use batching logic,” but instead to ensure that you have a break condition somewhere. It’s reasonable to ask for an end state before you begin processing something, after all.

Comments closed

LAG() vs All

Kathi Kellenberger shows the power of the LAG() function in T-SQL:

Microsoft introduced the first window (aka, windowing or windowed) functions with SQL Server 2005. These functions were ROW_NUMBERRANKDENSE_RANKNTILE, and the window aggregates. Many folks, including myself, used these functions without realizing they were part of a special group. In 2012, Microsoft added several more: LAG and LEADFIRST_VALUE and LAST_VALUEPERCENT_RANK and CUME_DISTPERCENTILE_CONT, and PERCENTILE_DISC. They also added the ability to do running totals and moving calculations.

These functions were promoted as improving performance over older techniques, but that isn’t always the case. There were still performance problems with the aggregate functions introduced in 2005 and the four of the functions introduced in 2012. In 2019, Microsoft introduced Batch Mode on Row Store, available on Enterprise and Developer Editions, that can improve the performance of window aggregates and the four statistical functions from 2012.

I started writing this article to compare some window function solutions to traditional solutions. I found that there were so many ways to write a query that includes a column from another row that this article is dedicated to the window functions LAG and LEAD.

In these sorts of circumstances, LAG() is extremely efficient at its job. Click through to see just how efficient.

Comments closed

Modifying Base Tables from Table Expressions

Itzik Ben-Gan continues a series on table expressions:

This article is the eleventh part in a series about table expressions. So far, I’ve covered derived tables and CTEs, and recently started the coverage of views. In Part 9 I compared views to derived tables and CTEs, and in Part 10 I discussed DDL changes and the implications of using SELECT * in the view’s inner query. In this article, I focus on modification considerations.

As you probably know, you’re allowed to modify data in base tables indirectly through named table expressions like views. You can control modification permissions against views. In fact, you can grant users permissions to modify data through views without granting them permissions to modify the underlying tables directly.

You do need to be aware of certain complexities and restrictions that apply to modifications through views. Interestingly, some of the supported modifications can end up with surprising outcomes, especially if the user modifying the data isn’t aware they’re interacting with a view. You can impose further restrictions to modifications through views by using an option called CHECK OPTION, which I’ll cover in this article. As part of the coverage, I’ll describe a curious inconsistency between how the CHECK OPTION in a view and a CHECK constraint in a table handle modifications—specifically ones involving NULLs.

As always, it’s well worth the read.

Comments closed