Recently I had a client complain of chronic high CPU utilization. The performance of their SQL Server had degraded, and it appeared to be related to higher than normal CPU utilization in conjunction with symptoms of unresponsive user queries. The root cause was twofold—a third party hosting provider had overallocated virtual processors on the physical host where the virtual machine (VM) running SQL Server was residing, as well as a recent upgrade from a version of VMWare that was not patched for Spectre and Meltdown. The host had 16 physical cores and was hyperthreading (making it effectively 32 cores) until the hosting provider patched from VMWare 5.5 to a newer release (we believe 6.5) which was required for Meltdown and Spectre processor vulnerabilities. This patch disabled hyperthreading from the hypervisor to mitigate the security risk from speculative execution. Note, this patch is over a year old and a critical security risk; most software vendors (including VMWare) put this out as an immediate requirement after the announcement of the vulnerabilities.
Given this was a virtual machine, it shared a physical host with many other VMs; this is a very common configuration. However, this host was VERY overallocated. As mentioned above, there were 16 cores–however 61 additional vCPUs had been allocated to other machines. That’s 4.3 times the number of CPUs available for allocation. The screenshot below shows this singular Host, highlighting the vCPUs allocated.
So, uh, that’s a bad thing. Monica explains in detail why exactly it’s a bad thing, which is helpful when you’re trying to explain to the server admin why it’s a bad thing. CPU oversubscription can work for things like dev boxes or web servers, where they typically aren’t anywhere near 100% utilization. It does not work at all for busy database servers.