Hans-Jürgen Schönig creates new statistics:
If you are using PostgreSQL for analytics or large-scale aggregations, you might occasionally notice the planner making false assumptions regarding the number of rows. While this isn’t a problem for small aggregates, it is indeed an issue for large-scale aggregations featuring many different dimensions.
In short: The more columns your GROUP BY statement contains, the more likely it is that optimizer overestimates the row count.
This blog explains how this can be handled in PostgreSQL.
Maybe it’s just me, but I don’t recall many instances in which adding multi-column statistics without any sort of index change significantly improved a query’s performance. I can understand how it could improve things like memory grants, so perhaps that’s how I’m selling it short. But I struggle to recall a specific case in which a query got measurably faster as a result.
Leave a Comment