Press "Enter" to skip to content

Interpretability Issues in Monitoring Tools

Brent Ozar explains how it’s hard to explain things without context:

However, outside of that window, you may not have any wait time on CPU at all. If queries are simple enough, and there isn’t enough concurrency, then as we discuss in Mastering Server Tuning, you can end up with SOS_SCHEDULER_YIELD waits with no time attached to them. You wouldn’t notice a CPU problem at all outside of that 8AM window because you don’t have a CPU bottleneck – yet. And to be fair, you probably wouldn’t tune a server in this kind of shape, either.

But when you DO need to tune a server that isn’t running at 100% capacity, picking the right query to tune isn’t just about wait stats: it’s also about which queries you need to be more responsive, and how you need to plan for future load. If your company tries to run a flash sale, and a lot of folks try to check out at the same time, Query T is going to knock your server over. But wait-stats based tools won’t see that coming: they’ll still be focused on Query R, the only one that spends a lot of time waiting on CPU.

Good food for thought.