Chad Callihan tries out a big improvement:

How do you go about finding the median percentile of a data set? What if you need the top x percentile? Both the APPROX_PERCENTILE_CONT and APPROX_PERCENTILE_DISC functions can be used to solve these questions.

Letâ€™s look at how we can use each and what makes them unique.

The approximate percentiles are guaranteed to be accurate to within a certain percentage, something like 3-5%, if I remember correctly–it’s higher than HyperLogLog’s ~2.5% but not so large as to be of low value. If you’ve ever tried to calculate a median or other percentile like the 75th or 95th percentile, you might have used `PERCENTILE_CONT()`

in the past. At least until you get a few million rows in the table, at which point you stopped using it. My joke is, once you reach a certain table size, `PERCENTILE_CONT()`

becomes so slow that it’s faster to install and configure SQL Server ML Services, learn R or Python, and send in the data to calculate a percentile than to wait for `PERCENTILE_CONT()`

to complete.

The `APPROX_PERCENTILE_*`

series is way, way faster. On reasonable-sized test cases of a couple million rows or so, my recollection is two orders of magnitude better performance, so long as you can deal with being off by a few percentage points. One of the best scenarios for something like this is calculating 95th percentile response times. Does it **really** matter that the actual response time was 187.5ms and SQL Server said 192.6 or 181.4? Probably not—you get a good idea of the magnitude, and that’s the important part here.