By above
BROADCAST_MOVE
operation, the rows indimension_City
table are all copied in a temporary table (calledTEMP_ID_3
) on all distributed database. (See below.)
Since the size ofdimension_City
is small, then all rows in this table is duplicated in all database before joining. This time, we join only 2 tables, however, if a lot of tables are needed to join, this data movement will become large overhead for query execution.
The short version is, replicate smaller dimensions and align distribution keys for large tables which get joined together. Both of these minimize the changes of the engine needing to shuffle data between nodes. These sorts of things can make a huge difference when working with Dedicated SQL Pools, cutting query time down by an order of magnitude in some extreme cases.
Comments closed