Calculating Skew In SQL

Lukas Eder shows how you can use PERCENTILE_DISC to calculate skewness in SQL:

In RDBMS, we sometimes use the term skew colloquially to mean the same thing as non-uniform distribution, i.e. a normal distribution would also be skewed. We simply mean that some values appear more often than others. Thus, I will put the term “skew” in double quotes in this article. While your RDBMS’s statistics contain this information once they are calculated, we can also detect such “skew” manually in ad-hoc queries using percentiles, which are defined in the SQL standard and supported in a variety of databases, as ordinary aggregate functions, including:
– Oracle
– PostgreSQL
– SQL Server (regrettably, only as window functions)

As Lukas implies, SQL Server is a step behind in terms of calculating percentiles, and calculating several percentiles over a large data set will be slow. Very slow. Though batch mode processing in 2019 does help here.

Related Posts

Errors Updating Stats on Columnstore Indexes

Max Vernon walks us through some problems trying to update statistics on columnstore indexes: The first error above would be seen if you have a SQL Server Agent job that updates statistics. The second error is how it looks in an SSMS Query window. The error message claims that UPDATE STATISTICS can only be used on a […]

Read More

SET STATISTICS IO And Automated Statistics Updates

Niko Neugebauer shows us something new in SQL Server 2019: There has never been such information before!We are just writing into it!Why do we have those wonderful 1351498 logical reads ?Are they actually writes ? And if they would be, would not it be correct to display them as physical accesses ? The answer is […]

Read More

Categories

January 2019
MTWTFSS
« Dec Feb »
 123456
78910111213
14151617181920
21222324252627
28293031