Press "Enter" to skip to content

Category: Syntax

Approximate Percentiles in SQL Server 2022

Chad Callihan tries out a big improvement:

How do you go about finding the median percentile of a data set? What if you need the top x percentile? Both the APPROX_PERCENTILE_CONT and APPROX_PERCENTILE_DISC functions can be used to solve these questions.

Let’s look at how we can use each and what makes them unique.

The approximate percentiles are guaranteed to be accurate to within a certain percentage, something like 3-5%, if I remember correctly–it’s higher than HyperLogLog’s ~2.5% but not so large as to be of low value. If you’ve ever tried to calculate a median or other percentile like the 75th or 95th percentile, you might have used PERCENTILE_CONT() in the past. At least until you get a few million rows in the table, at which point you stopped using it. My joke is, once you reach a certain table size, PERCENTILE_CONT() becomes so slow that it’s faster to install and configure SQL Server ML Services, learn R or Python, and send in the data to calculate a percentile than to wait for PERCENTILE_CONT() to complete.

The APPROX_PERCENTILE_* series is way, way faster. On reasonable-sized test cases of a couple million rows or so, my recollection is two orders of magnitude better performance, so long as you can deal with being off by a few percentage points. One of the best scenarios for something like this is calculating 95th percentile response times. Does it really matter that the actual response time was 187.5ms and SQL Server said 192.6 or 181.4? Probably not—you get a good idea of the magnitude, and that’s the important part here.

Comments closed

Updating Records in a Kusto Database

Vincent-Philippe Lauzon has an announcement:

Kusto databases, either in Fabric (KQL Database) or in Azure (Azure Data Explorer), are optimized for append ingestion.

In recent years, we’ve introduced the .delete command, allowing you to selectively delete records.

In February, we introduced the .update command in public preview.  This command allows you to update records by deleting existing records and appending new ones in a single transaction.

Today, the .update is Generally Available (GA)!

Click through for more details, including a link to the documentation, where you can see several examples of the syntax.

Comments closed

The CLEAN Block in Powershell

Mike Robbins takes us through some relatively new functionality:

PowerShell, a powerful scripting language and automation framework, provides features that enhance script development and execution. Among these features is the clean block, a lesser-known yet beneficial component in PowerShell functions. This article explores the clean block, its purpose, and how to use it effectively in PowerShell scripts.

Read on to learn more about the block and how it works.

Comments closed

Emulating the FILTER Clause in Oracle

Lukas Eder notes a bit of ANSI SQL:

The following aggregate function computes the number of rows per group which satifsy the FILTER clause:

SELECT  COUNT(*) FILTER (WHEREBOOK.TITLE LIKE'A%'),  COUNT(*) FILTER (WHEREBOOK.TITLE LIKE'B%'),  ...FROMBOOK

This is useful for pivot style queries, where multiple aggregate values are computed in one go. For most basic types of aggregate function, it can be emulated simply by using CASE expressions, because standard aggregate functions ignore NULL values when aggregating. 

Lukas shows how you can also implement this logic using JSON_TRANSFORM() though I think I’d just as soon stick with COUNT(CASE WHEN BOOK.TITLE LIKE 'A%' THEN 1 END) and try hard not to think about shredding JSON.

Comments closed

UNISTR() and || in Azure SQL Database

Abhiman Tiwari announces a new function and a new operator:

We are excited to announce that the UNISTR intrinsic function and ANSI SQL concatenation operator (||) are now available in public preview in Azure SQL Database. The UNISTR function allows you to escape Unicode characters, making it easier to work with international text. The ANSI SQL concatenation operator (||) provides a simple and intuitive way to combine characters or binary strings. These new features will enhance your ability to manipulate and work with text data. 

Click through to learn more about both. Honestly, I’d rather stick with CONCAT() versus using || because of how CONCAT() handles NULL without me having to check every operand first.

Comments closed

Using the CONVERT() Function in T-SQL

Joe Gavin shows how to use a function:

A common task while working with data in Microsoft SQL Server is converting from one data type to another. Most often, it’s done to change the way data is presented, but sometimes it is needed to make sure the right data types are being used for comparisons, joins, or sorting.

The SQL CONVERT function, which has long been part of the SQL language, and as the name implies, can be used to convert a value of one data type into a specified data type with optional formatting attributes. CONVERT gives you the ability to format, whereas the ISO Compliant CAST function does not.

My very strong recommendation for 99% or so of the audience: use TRY_CONVERT() instead. TRY_CONVERT() came out in SQL Server 2012 (sorry for the 1% stuck pre-2012) and has the same performance profile as CONVERT(), except that, when conversion fails, TRY_CONVERT() returns NULL rather than throwing an error.

There is also a TRY_CAST() that does exactly what you think it would.

Comments closed

Concatenating Strings and (N)VARCHAR Truncation

Vlad Drumea troubleshoots a common problem:

The code in this case is the GetStatsInfoForWholeDB.sql script that’s part of PSBlitz’s resources.
This script is used for, you wouldn’t believe by the name alone, getting statistics information for a specific database.

Due to the fact that it might be ran on Azure or on older versions of SQL Server, as well as on databases with incremental statistics, the best option for it was to use dynamic SQL.

In this case it uses a variable @SQL defined as NVARCHAR(MAX) to store the query that’s built at runtime and execute it via EXEC.

Read on for one of the most common issues you may run into around generating dynamic SQL.

Comments closed

The Joy of the Common Table Expression

Kevin Wilkie talks common table expressions:

Most of our coding these days has Common Table Expressions, also known as CTEs. If not, you’re either working on an older version of SQL Server or you haven’t been introduced to this piece of goodness.

CTEs can make reading SQL queries a lot easier if the logic is convoluted. For example, let’s use the following in a CTE.

I’ll admit that I probably over-use common table expressions, but I like them more than sub-queries—I find them easier to read, and if they’re going to perform as well (or poorly) as sub-queries, I’d might as well use the form that makes more intuitive sense to me.

Comments closed

OPTIMIZE FOR vs Forced Plans in SQL Server

Erik Darling makes a comparison:

I often see clients using forced plans or plan guides (yes, even still, to this day) to deal with various SQL Server performance problems with plans changing.

There’s usually an execution plan or two floating around that seems to be a good general idea for a given query, and a couple weird high-end and low-end outliers for very specific populations of values.

Read the whole thing, of course.

In defense of plan guides, the company I used to work for had a few—maybe three or four in total—because of really weird data skew problems on database 106 out of 700 (or so)—because there’s always one customer that makes wildly different use of the system than everyone else. And so a query that worked perfectly fine for 699 databases (or so) flops like a fish out of water for this one database with this one customer’s data in it. So the plan guide was a nicer expediency than optimizing for mediocre on all 700 (or so) databases.

1 Comment