Avoid WHERE COUNT(*) = 0

Published 2024-12-02 by Kevin Feasel

Every now and then I see something like this:
SELECT u.* FROM users u
WHERE 0 = (SELECT COUNT(*) FROM addresses a WHERE a.user_id = u.id);
and it kinda pains me. So figured, I’ll write about what is the problem with it, and how to avoid such constructs.

Read on to understand what the problem is and how you can fix it.

Published in Syntax

6 Comments

Erik Darling 2024-12-02

I’m surprised you’d link to someone falling for the “sub selects are bad” meme.
- Kevin Feasel 2024-12-02
  
  He didn’t fall for that trap. There was a commenter who did but I think Hubert’s response was totally fair: don’t avoid sub-selects out of some misguided notion that they’re always worse, and check the performance of multiple options (with and without sub-selects) given your specific data distribution.
  - Erik Darling 2024-12-02
    
    That also does not matter. It’s all a semi join regardless, e.g. https://www.brentozar.com/pastetheplan/?id=HJPM8IjX1l
    - Kevin Feasel 2024-12-02
      
      Very interesting. I didn’t know SQL Server had that optimization, so you win this round, Gadget. But next time, next time!
      
      Now it makes me want to test this out in Postgres to see if its optimizer doesn’t have that particular optimization rule.
    - Kevin Feasel 2024-12-02
      
      I dug into this a little further and it turns out that Postgres 17.2 does not have the same optimization that exists in SQL Server. A simple repro:
      
      CREATE TABLE Product (
      ProductID SERIAL PRIMARY KEY,
      SomeNum INT
      );
      
      INSERT INTO Product(SomeNum) SELECT generate_series(0, 100);
      
      CREATE TABLE Sales (
      SalesID SERIAL PRIMARY KEY,
      ProductID INT
      );
      
      INSERT INTO Sales(ProductID) SELECT generate_series(0, 10000) % 100;
      
      EXPLAIN
      SELECT COUNT(*) FROM Product p
      WHERE NOT EXISTS (SELECT * FROM Sales s WHERE s.ProductID = p.ProductID);
      
      EXPLAIN
      SELECT COUNT(*) FROM Product p
      WHERE 0 = (SELECT COUNT(*) FROM Sales s WHERE s.ProductID = p.ProductID);
      
      —
      
      In the first case, you get the Hash Right Anti-Join that you’d expect, and that we see in SQL Server for both cases. But in the latter, it performs a sequence scan that is considerably more expensive. Which means that the advice is good for Postgres but not SQL Server. My mistake was in not being aware of the optimization for SQL Server, and so I didn’t call out that this was a Postgres-specific note. I do appreciate you showing a bit of SQL Server superiority.
Shop Talk: 2024-12-02 – Shop Talk with TriPASS 2024-12-04

[…] I learned something from Erik Darling. I’ve learned many things from Erik Darling, but this one was around NOT EXISTS versus 0 = (SELECT COUNT(*)…). It turns out that SQL Server has an optimization that makes both return the same execution plan. After the recording, I went and checked Postgres, which does not have the same optimization. This is yet another case of how behaviors in one platform may subtly differ from others. […]

Comments are closed.

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31