APPROX_COUNT_DISTINCT

Kevin Feasel

2018-10-15

T-SQL

Niko Neugebauer is happy with a new function in SQL Server 2019:

A rather interesting result takes place if we scale our database to 100GB TPCH and run the very same queries – the total elapsed time jumps to 50% difference (from 30%), the CPU execution time difference is kept at 50%, but the memory grant gives the biggest difference ever – those 24.476 MB are still intact for the APPROX_DISTINCT_COUNT, while the COUNT(DISTINCT) asks for just a bit over 11GB ! Besides going through a completely different gateway on the bigger machines, running COUNT(DISTINCT) will bring your system to a full stop way before the same will take place with the APPROX_DISTINCT_COUNT.
Regarding the precision – in my tests I did not see the difference going over 1%.

Test before using this function, but if you don’t the correct number and can make do with “close enough,” this can save a boatload of memory on larger tables.

Related Posts

The Importance of Aliasing in Subqueries

Gail Shaw explains an unexpected result when writing a statement with a subquery: The column name in the temp table is missing an I, probably just a typo, but it has some rather pronounced effects. The obvious next question is why the select with the subquery in it didn’t fail, after all, the query asks […]

Read More

Multi-Pattern Replacement with SQL Server

Hugo Kornelis has a pattern matching problem to solve: The actual use case and the list of patterns that I had to remove are considered a confidential competitive advantage by my client, so I will just make up a list here to illustrate the problem. In this fake requirement, the following patterns must all be […]

Read More

Categories

October 2018
MTWTFSS
« Sep Nov »
1234567
891011121314
15161718192021
22232425262728
293031