Press "Enter" to skip to content

Data Types Matter, Even in the Serverless SQL Pool

Jovan Popovic has a public service announcement for us:

The serverless SQL pool is a distributed computing system that executes concurrent queries on a set of distributed compute nodes. Multiple compute nodes are running the parts of a distributed query plan that read the underlying files, join the data sets, group, and aggregate results. Different queries might try to use the same compute nodes to execute the parts of the queries.

The oversized column types like VARCHAR(MAX) might trick the compute node to allocate more resources than is needed. However, the allocation is based on the estimate, but these over-allocated resources will not be used in actual execution because they are not needed. If a compute node needs 100MB to sort the results it will use these 100MB although the query optimizer allocated 4GB of memory for the task on the compute node.

Read the whole thing.