APPROX_COUNT_DISTINCT

Kevin Feasel

2018-10-15

T-SQL

Niko Neugebauer is happy with a new function in SQL Server 2019:

A rather interesting result takes place if we scale our database to 100GB TPCH and run the very same queries – the total elapsed time jumps to 50% difference (from 30%), the CPU execution time difference is kept at 50%, but the memory grant gives the biggest difference ever – those 24.476 MB are still intact for the APPROX_DISTINCT_COUNT, while the COUNT(DISTINCT) asks for just a bit over 11GB ! Besides going through a completely different gateway on the bigger machines, running COUNT(DISTINCT) will bring your system to a full stop way before the same will take place with the APPROX_DISTINCT_COUNT.
Regarding the precision – in my tests I did not see the difference going over 1%.

Test before using this function, but if you don’t the correct number and can make do with “close enough,” this can save a boatload of memory on larger tables.

Related Posts

Generating SQL With Biml

Cathrine Wilhelmsen shows us you can do a lot more with Biml than just generating SSIS packages: This actually happened to me in a previous job. We had a fairly complex ETL solution for the most critical part of our Data Warehouse. Many SSIS packages, views, and stored procedures queried the tables that were replicas […]

Read More

Rowcount Shenanigans When Deleting In Batches

Denis Gobo takes us through a few issues you might run into when deleting data in batches: I have always used WHILE @@rowcount > 0 but you have to be careful because @@rowcount could be 0 when your while loop starts Let’s take a look at an example. This is a simplified example without a where […]

Read More

Categories

October 2018
MTWTFSS
« Sep Nov »
1234567
891011121314
15161718192021
22232425262728
293031