I was recently playing with the analytical group of windowing functions, and I wanted to understand how they worked “under the covers.” I ran into a little logic puzzle with
PERCENTILE_CONTby trying to write a query that returned the same results using pre-2012 functionality.
Given a list of ranked values, you can use the
PERCENTILE_CONTfunction to find the value at a specific percentile. For example, if you have the grades of 100 students, you can use
PERCENTILE_CONTto locate the score in the middle of the list, the median, or at some other percent such as the grade at 90%. This doesn’t mean that the score was 90%; it means that the position of the score was at the 90th percentile. If there is not a value at the exact location,
PERCENTILE_CONTinterpolates the answer.
I’m a bit disappointed with how poorly
PERCENTILE_CONT performs against large data sets, especially if you need multiple percentiles. It’s bad enough that going into ML Services and getting percentiles with R is usually faster for me. But for datasets of less than 100K or so rows, it’s the easiest non-CLR method to get the median (with the easiest CLR method being SQL#).