With this change, the query is executed very fast, with the appropriate execution plan:
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 197 ms.However, the LOOP hint does not affect estimations and the optimizer decisions related to them; it just replaces join operators chosen by the optimizer by Nested Loop Joins specified in the hint. SQL Server still expects billions of rows, and therefore the query got more than 2 GB memory grant for sorting data, although only 3.222 rows need to be sorted. The hint helped optimizer to produce a good execution plan (which is great; otherwise this query would take very long and probably will not be finished at all), but high memory grant issue is not solved.
As you might guess, now it’s time for table variables.
This is an interesting article with workarounds and counter-workarounds to solve a nasty estimation problem.