Erik Darling has a two-fer here. First, window functions and parallelism:
When windowing functions don’t have a Partition By, the parallel zone ends much earlier on than it does with one.
That doesn’t mean it’s always slower, though. My general experience is the opposite, unless you have a good supporting index.
But “good supporting index” is for tomorrow. You’re just going to have to deal with that.
Second, columnstore behavior with respect to window functions:
Not only is the parallel version of the row mode plan a full second slower, but… look at that batch mode plan.
Look at it real close. There’s a sort before the Window Aggregate, despite reading from the same nonclustered index that the row mode plan uses.
But the row mode plan doesn’t have a Sort in it. Why?
Check out both posts.