Press "Enter" to skip to content

Category: T-SQL

View Filters and Short-Circuiting

Reitse Eskens takes us through a fun oddity with short-circuiting and views:

The question from my coworker was simple. Why is this happening? Because he’s selecting from the view, his instinct is that the returned result set should be filtered within the view first and that the resultset can be narrowed down further with the regular query.

It’s interesting that there’s deterministic behavior both ways. My recollection is that ANSI SQL does not honor short-circuiting, as all filters are considered to happen at the same time, and thus any ordering is valid. But in practice, there are places where different code bases end up with stable short-circuiting as an implementation detail.

Comments closed

Capturing Inserts and Updates in MERGE Statements

The Purple Frog folks show us how to collect the counts of insert and update operations when using MERGE statements:

This post hows how you can capture and store the number of records inserted, updated or deleted from a T-SQL Merge statement.

This is in response to a question on an earlier post about using Merge to load SCDs in a Data Warehouse.

You can achieve this by using the OUTPUT clause of a merge statement, including the $Action column that OUTPUT returns.

Read on for the answer. If only MERGE weren’t so riddled with problems.

Comments closed

Understanding PERCENTILE_CONT

Kathi Kellenberger takes us through the PERCENTILE_CONT window function:

I was recently playing with the analytical group of windowing functions, and I wanted to understand how they worked “under the covers.” I ran into a little logic puzzle with PERCENTILE_CONT by trying to write a query that returned the same results using pre-2012 functionality.

Given a list of ranked values, you can use the PERCENTILE_CONT function to find the value at a specific percentile. For example, if you have the grades of 100 students, you can use PERCENTILE_CONT to locate the score in the middle of the list, the median, or at some other percent such as the grade at 90%. This doesn’t mean that the score was 90%; it means that the position of the score was at the 90th percentile. If there is not a value at the exact location, PERCENTILE_CONT interpolates the answer.

I’m a bit disappointed with how poorly PERCENTILE_CONT performs against large data sets, especially if you need multiple percentiles. It’s bad enough that going into ML Services and getting percentiles with R is usually faster for me. But for datasets of less than 100K or so rows, it’s the easiest non-CLR method to get the median (with the easiest CLR method being SQL#).

Comments closed

Overlooked T-SQL Functions

Itzik Ben-Gan covers some underutilized functions and function overloads in T-SQL:

TRIM is more than LTRIM(RTRIM())
SQL Server 2017 introduced support for the function TRIM. Many people, myself included, initially just assume that it’s no more than a simple shortcut to LTRIM(RTRIM(input)). However, if you check the documentation, you realize that it’s actually more powerful than that.

This article is an excellent argument in favor of reading the documentation, as all of it is in there but it’s easy to miss.

Comments closed

Date and Time Aggregation in T-SQL

Daniel Hutmacher shows how you can aggregate date and time data types in SQL Server:

Because the “Duration” column is a “time(3)” datatype, it doesn’t really aggregate into an average that easily.

Msg 8117, Level 16, State 1, Line 20 Operand data type time is invalid for avg operator.

This may seem a little odd, as time is really a continuum just like any integer or floating-point value, right?

Read on to see how to do this and a warning from Daniel about overflowing. This sort of aggregation is a lot easier to do in R, but you can still do it in T-SQL.

Comments closed

Searching for Column-Level Metadata

Aaron Bertrand takes us through a few DMVs and catalog views which will help us find SQL Server metadata:

Let’s say we are interested in each column of each table: the ordinal position, the name, the data type, and whether it is nullable. In all currently supported versions of SQL Server (2012 and up), there is a handy function that allows you to get all the columns for a T-SQL query, sys.dm_exec_describe_first_result_set. This function is superior to querying the information from sys.columns, mainly because you don’t have to join to sys.types, or embed complicated logic to deal with translation; for example, CASE expressions that change -1 to max, eliminating types that have synonyms, or cutting nvarchar lengths in half – but only if they aren’t max. (You can see the type of query you end up in an earlier tip, “SQL Server Data Type Consistency.”)

This use of sys.dm_exec_describe_first_result_set() is new to me.

Comments closed

Generating Unique File Names

Slava Murygin gives us unique file names:

That is pretty common task to generate new files with a timestamp in their names.
It gives you ability to easily identify them, sort them and make them pretty unique.
However, if you have a very busy process it is possible that duplicate name will be produced and you might loose some data.

To avoid that situation I’ve came up with following solution.

It is difficult to envision this solution going wrong.

Comments closed

SQL Server and Integer Math

Bert Wagner reminds us that integer math is a thing:

To determine how much lumber I would need for building the new walls, I decided to write a SQL query to help with my framing calculations. I was building a 6 foot wall and wanted to put a stud every 16 inches. Easy enough to do the math on this:

SELECT (6*12)/16

The output of the query above was 4, indicating the number of studs I would need for one wall section.

What’s interesting is that if we do this same equation in a calculator, we get a slightly different answer: 4.5.

Click through for Bert’s thoughts on the issue. Other languages and platforms do this as well, so it’s not unique to SQL Server, but if you’re not used to it, you might be in for a surprise.

Comments closed

Computing Time to Payment on Invoices

Daniel Hutmacher has a painful but realistic problem to solve:

Here’s an example customer. You’ll notice right off the bat that we’re sending this customer an invoice every day on the 20th of the month. To add some complexity, the customer will arbitrarily pay parts of the invoiced amount over time, and to add insult to injury, the banking interface won’t tell us which invoice the customer is paying for, so we’ll just decide that each payment goes towards the oldest outstanding invoice.

Our task is to calculate how many days have elapsed, for each invoice, from invoice date to payment in full.

Daniel has an excellent solution to the problem, so check it out.

Comments closed