Calculating Median In SQL Server 2019

Tomaz Kastrun shows that batch aggregation mode on window functions allow PERCENTILE_CONT finally to become useful:

Next query, for median calculation was a window function query.


SELECT DISTINCT

PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY c1)

OVER (PARTITION BY (SELECT 1)) AS MedianCont

FROM t1


To my surprise, the performance was even worse, and at this time, I have to say, I was running this on SQL Server 2017 with CU7. But luckily, I had a SQL Server 2019 CTP 2.0 also installed and here, with no further optimization the query ran little over 1 second.

I’ve warned people away from this function for all but tiny data sets because of how poorly it performs. With SQL Server 2019, I might be able to recommend it.

Related Posts

SARGability and Date Functions

Erik Darling shows why you don’t want to use YEAR() or MONTH() in the WHERE clause when querying a large table: If you’ve been query tuning for a while, you probably know about SARGability, and that wrapping columns in functions is generally a bad idea. But just like there are slightly different rules for CAST and […]

Read More

Multi-Pattern Replacement with SQL Server

Hugo Kornelis has a pattern matching problem to solve: The actual use case and the list of patterns that I had to remove are considered a confidential competitive advantage by my client, so I will just make up a list here to illustrate the problem. In this fake requirement, the following patterns must all be […]

Read More

1 Comment

  • […] As Lukas implies, SQL Server is a step behind in terms of calculating percentiles, and calculating several percentiles over a large data set will be slow. Very slow. Though batch mode processing in 2019 does help here. […]

Comments are closed

Categories

January 2019
MTWTFSS
« Dec Feb »
 123456
78910111213
14151617181920
21222324252627
28293031