Sean Gallardy turns a problem on its head:
Now let’s get to the main point, which is how long the VM stays paused or stunned – remember, this is a “small” or “short” amount of time, one might even say “trivial”. When it is kept this short to where it’s “trivial” as in less than a second then all is good and you most likely won’t notice it except in very high workloads… but we should be running with VSS integration and not VM level so it’s still incorrect, but hey. When this time is not short of trivial then GOOD things start to happen, most notably that high availability kicks in.
I appreciate the framing of this post, as the failover wasn’t a problem; it merely exposes the actual problem.