Erik Darling has a two-parter on how using batch mode processing when working with window functions can lead to better performance. Part 1 sets the stage:
If you ask people who tune queries why batch mode is often much more efficient with windowing functions, they’ll tell you about the window aggregate operator.
That’s all well and good, but there’s another, often sneaky limitation of fully row mode execution plans with windowing functions.
Let’s go take a look!
Part 2 identifies the culprit:
When queries go parallel, you want them to be fast. Sometimes they are, and it’s great.
Other times they’re slow, and you end up staring helplessly at a repartition streams operator.
Check out both of these posts.