Calculating Median In SQL Server 2019

Tomaz Kastrun shows that batch aggregation mode on window functions allow PERCENTILE_CONT finally to become useful:

Next query, for median calculation was a window function query.


SELECT DISTINCT

PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY c1)

OVER (PARTITION BY (SELECT 1)) AS MedianCont

FROM t1


To my surprise, the performance was even worse, and at this time, I have to say, I was running this on SQL Server 2017 with CU7. But luckily, I had a SQL Server 2019 CTP 2.0 also installed and here, with no further optimization the query ran little over 1 second.

I’ve warned people away from this function for all but tiny data sets because of how poorly it performs. With SQL Server 2019, I might be able to recommend it.

Related Posts

Optimizer Imperfections With Complex Filters

Erik Darling shows a couple examples of how the optimizer will sometimes pick a superior plan when dealing with complicated filters but not always: Sometimes, the optimizer can take a query with a complex where clause, and turn it into two queries. This only happens up to a certain point in complexity, and only if […]

Read More

COUNT And NULL

Bert Wagner explains some of the trickiness of COUNT and NULL values in SQL Server: One thing I see fairly often (and am occasionally guilty of myself) is using COUNT(DISTINCT) and DISTINCT interchangeably to get an idea of the number of unique values in a column. While they will sometimes give you the same results, […]

Read More

1 Comment

  • […] As Lukas implies, SQL Server is a step behind in terms of calculating percentiles, and calculating several percentiles over a large data set will be slow. Very slow. Though batch mode processing in 2019 does help here. […]

Comments are closed

Categories

January 2019
MTWTFSS
« Dec Feb »
 123456
78910111213
14151617181920
21222324252627
28293031