According to Pedro’s slide, but not the ENTIRELY MISSING DOCUMENTATION, this wait is the “safe” type of parallelism wait.
It’s a good thing Pedro is a dutiful blogger, so we don’t have to pull our hair out while unfurling these mysteries.
Speaking of documentation, our new CXCONSUMER friend isn’t mentioned in Query Store Wait Stats, either.
This is a very useful addition.
In SQL Server 2016 a new DMV is exposed, sys.dm_exec_session_wait_stats, which provides information about waits for an existing, active session. If you know the session_id, you can track waits for a query when it starts and when it completes (snapshot the information at the beginning and end of the query and then diff the information). The challenge is that you have to know the session_id for the query, and you have to set up data capture in advance – which isn’t trivial when you’re in the midst of a high priority issue.
Wait statistics information exists in an actual execution plan starting in SQL Server 2016 SP1. Only the top 10 waits are captured, and there are limitations in terms of what this data represents. For example, CXPACKET is ignored and not included in the output, but it will be included in 2016 SP2 and 2017 CU3 and up – where irrelevant parallelism waits are instead captured by CXCONSUMER (which will not be included in actual plan waits).
So how can we see what a specific query is truly waiting on? We can use Query Store! SQL Server 2017 includes the capture of wait statistics information in Query Store, and this functionality is also available in Azure SQL Database. Wait statistics are tied to a query plan, and are captured over time, just like the runtime statistics. The addition of wait statistics information in Query Store was the number one feature request after its initial release, and all that information together creates powerful troubleshooting capabilities.
Read on for a demo-filled post. It’s hard to overstate just how useful this information is for query tuning.
It’s important to mention a couple notes before getting into details:
- Parallelism use is intrinsically a benefit for costly queries, bound by the Cost Threshold for Parallelism and Max Degree of Parallelismserver configurations among other considerations. See the Query Processing Guide for more in-depth information on how SQL Server implements parallelism. Also refer to Craig Freedman’s blog series on the topic.
- Waits are a normal part of the waits and queues model, allowing SQL Server to concurrently execute many more requests than there are schedulers available. See more about waits and queues here,
It’s also important to understand that parallelism is implemented as if it were two operators. There’s the producers that push data to consumers, and the consumers that may have to wait for data from producers.
And so it becomes easier to understand how producer waits are the ones that may require attention, while consumer waits are inevitable as a passive consequence of longer running producers.
I actually would have preferred the rename go the other way and create a CXPRODUCER which is bad, rather than a CXCONSUMER which is benign. The problem is that if you’re running in a heterogeneous environment, you have to remember that CXPACKET on Server A might indicate a problem, whereas it’s much fuzzier on Server B because you haven’t upgraded Server B to the latest CU yet. Regardless, I’m happy that we’re going to be able to have a better idea when parallelism is actually a problem.
Microsoft’s Joe Sack & Pedro Lopes held a forward-looking session for performance tuners at the PASS Summit and dropped some awesome bombshells.
Pedro’s Big Deal: there’s a new CXPACKET wait in town: CXCONSUMER. In the past, when queries went parallel, we couldn’t differentiate harmless waits incurred by the consumer thread (coordinator, or teacher from my CXPACKET video) from painful waits incurred by the producers. Starting with SQL Server 2016 SP2 and 2017 CU3, we’ll have a new CXCONSUMER wait type to track the harmless ones. That means CXPACKET will really finally mean something.
Read on to see what Joe has for us.
RESOURCE_SEMAPHORE_QUERY_COMPILE – this means a query came in, and SQL Server didn’t have an execution plan cached for it. In order to build an execution plan, SQL Server needs a little memory – not a lot, just a little – but that memory wasn’t available. SQL Server had to wait for memory to become available before it could even build an execution plan. For more details and a reproduction script, check out my Bad Idea Jeans: Dynamically Generating Ugly Queries post. In this scenario, cached query plans (and small ones) may be able to proceed just fine (depending on how much pressure the server is under), but the ugly ones will feel frozen.
Read on for a couple more examples of poison wait types.
That sounds amazing! Our server only had 609 milliseconds of wait time altogether, none of which was spent waiting on storage! Our storage must be super-blazing fast, and there’s practically no way to tune it to wait less than 609 milliseconds, right?
Spoilers: it’s not that fast.
Specifically, I was concerned about SOS_SCHEDULER_YIELD waits. This is a special wait type that occurs when a thread is able to run for 4ms of CPU time (called the thread quantum) without needing to get suspended waiting for an unavailable resource. In a nutshell, a thread must call into the SQLOS layer every so often to see whether it has exhausted its thread quantum, and if so it must voluntarily yield the processor. When that happens, a context switch occurs, and so a wait type must be registered: SOS_SCHEDULER_YIELD. A deeper explanation of this wait type is in my waits library here.
My theory was this: if a VM is prevented from running for a few milliseconds or more, that could mean that a thread that’s executing might exhaust its thread quantum without actually getting 4ms of CPU time, and so yield the processor causing an SOS_SCHEDULER_YIELD wait to be registered. If this happened a lot, it could produce a set of wait statistics for a virtualized workload that appears to have lots of SOS_SCHEDULER_YIELDs, when in fact it’s actually a VM performance problem and the SOS_SCHEDULER_YIELD waits are really ‘fake’.
Read on for more details, and definitely check out the link. It was an eye-opener when I learned that SOS_SCHEDULER_YIELD didn’t mean “need more/more powerful CPUs.”
Lonny Niederstadt has dropped a boatload of information on us, with a four-part series looking at the OLEDB wait type when running DBCC CHECKTABLE. Part 1 sets up the problem:
So… yeah. Definitely a relationship between the number of pages in the heap and the number of OLEDB waits. Something(s) increase the number of OLEDB waits beyond the number of data pages, the number of used pages and the number of reserved pages in a way that I haven’t deciphered yet, though.
On this very simple heap, with nearly the same number of rows as pages in the heap, the number of OLEDB wait events is nearly the same – only increased by one – when “upgrading” the checktable operation from physical_only to a checktable with logical checks.
So how do the results match with expectations? Whoa!! 25745 OLEDB wait events as the starting point for MAXDOP 1 and 2. Then single-stepping up to 25752 OLEDB wait events for MAXDOP 8. 25752 is exactly 4 times the used page count of 6438 for the clustered index. So the OLEDB wait events went from a small number more than the count of pages/rows with the heap… to almost 4 times as many with the clustered index. *That* was a bit unexpected.
All right. What if we use a heap with a nonclustered include index? Based on the table and index definition below, I expect the nonclustered include index nci_OLEDB_TEST_HEAP_NCI_INCLUDE to be remarkably similar to the clustered index we previously used.
Interesting reading so far in this series.
I once had an extremely busy system where we had growing CMEMTHREAD waits. This is an unusual wait, and our question was: is this being caused by a single query, just a few queries, or all queries? We were able to answer this by setting up an extended events trace and looking at waits by query, but this had some downsides:
- Extended Events has no GUI in 2008R2, and setting up and testing the scripts took a bit of time (more minor issue)
- Generating wait information by query on a very busy system generates a lot of output, so we had to be careful to set up sampling and filtering so we didn’t impact performance (more major issue)
But we were able to use this to figure out that the wait was associated with all queries, not a few queries, which helped us down our troubleshooting path.
Kendra answers this (mostly) in the context of SQL Server 2008 R2, as that was the version the questioner had, but she does mention where later versions make life easier.
Paul’s process gives you info on every instance of a wait the query experienced and it’s very easy to aggregate those results to see the top waits and their total effect on a query. Quite often though, you don’t need a lot of detail. You don’t need to know every wait, just the top several. If you are already generating the actual query plan to have a in-depth look at the plan, wouldn’t it be nice if the query wait stats were already in there for you?
Now they are. In SQL Server 2016 (I’m told it came in SP1, but I don’t have a non-SP1 instance to verify that), the actual execution plan includes the top waits for the query execution in the plan. You can see them by clicking on the left-most (first) operator in the plan and viewing the Properties (shortcut F4). It will list the top waits right there in the properties dialog for you.
Getting single-query wait stats in the execution plan makes life so much simpler.