APPROX_COUNT_DISTINCT

Kevin Feasel

2018-10-15

T-SQL

Niko Neugebauer is happy with a new function in SQL Server 2019:

A rather interesting result takes place if we scale our database to 100GB TPCH and run the very same queries – the total elapsed time jumps to 50% difference (from 30%), the CPU execution time difference is kept at 50%, but the memory grant gives the biggest difference ever – those 24.476 MB are still intact for the APPROX_DISTINCT_COUNT, while the COUNT(DISTINCT) asks for just a bit over 11GB ! Besides going through a completely different gateway on the bigger machines, running COUNT(DISTINCT) will bring your system to a full stop way before the same will take place with the APPROX_DISTINCT_COUNT.
Regarding the precision – in my tests I did not see the difference going over 1%.

Test before using this function, but if you don’t the correct number and can make do with “close enough,” this can save a boatload of memory on larger tables.

Related Posts

Enabling Database-Level Change Tracking

Tim Weigel continues a series on change tracking: If you don’t provide a retention period, SQL Server’s default is 2 days. Auto-cleanup defaults to ON unless you tell it otherwise. Easy! The table level commands aren’t any more complicated. Before we get started, please note that change tracking requires a primary key on the table […]

Read More

Isolation Levels and Dynamic SQL

Max Vernon points out how transaction isolation levels work when combined with sp_executesql: Imagine you have a piece of code where you don’t care about the downsides to the “read uncommitted” isolation level, and do your due diligence by adding SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; at the start of your code. The code following that statement […]

Read More

Categories

October 2018
MTWTFSS
« Sep Nov »
1234567
891011121314
15161718192021
22232425262728
293031