Inserts and other modifications to table variables can’t be parallelized. This is a product limitation, and the XML warns us about it.
The select could go parallel if the cardinality estimate were more accurate. This could potentially be addressed with a recompile hint, or with Trace Flag 2453.
Click through to see an example of what Erik means.
It’s not entirely uncommon to want to group by a computed expression in an aggregation query. The trouble is, whenever you group by a computed expression, SQL Server considers the ordering of the data to be lost, and this will turn your buttery-smooth Stream Aggregate operation into a Hash Match (aggregate) or create a corrective Sort operation, both of which are blocking.
Is there anything we can do about this? Yes, sometimes, like when those computed expressions are YEAR() and MONTH(), there is. But you should probably get your nerd on for this one.
There are many ways to solve a problem, and sometimes the best method is indirect.
Let’s try and figure out the costing formula for the Sort operator. Remember, our focus is the estimated cost and scaling because our ultimate goal is to figure out optimization thresholds where the optimizer changes its choices from one strategy to another.
The I/O cost estimate seems to be fixed: 0.0112613. I get the same I/O cost irrespective of factors like number of rows, number of sort columns, data type, and so on. This is probably to account for some anticipated I/O work.
As for the CPU cost, alas, Microsoft doesn’t publicly expose the exact algorithms that they use for sorting. However, among the common algorithms used for sorting by database engines in general are different implementations of merge sort and quicksort. Thanks to efforts made by Paul White, who’s fond of looking at Windows debugger stack traces (not all of us have the stomach for this), we have a bit more insight into the topic, published in his series “Internals of the Seven SQL Server Sorts.” According to Paul’s findings, the general sort class (used in the above plan) uses merge sort (first internal, then transitioning to external). In average, this algorithm requires n log n comparisons to sort n items. With this in mind, it’s probably a safe bet as a starting point to assume that the CPU part of the operator’s cost is based on a formula such as:Operator CPU cost = <startup cost> + @numrows * LOG(@numrows) * <comparison cost>
Of course, this could be an oversimplification of the actual costing formula that Microsoft uses, but absent any documentation on the matter, this is an initial best guess.
It’s interesting to see how the calculation changes in form with larger numbers of rows.
The plan is still showing us a warning, even though we see a literal in the cache.
This is obviously wrong. And very confusing.
Read on for the answer.
As implied by its logical operation, “Create Bitmap”, the Bitmap operator creates a bitmap. (I assume that this is the only logical operation that the Bitmap operator supports, since I have never seen a Bitmap operator with a different logical operation). A bitmap is a structure that stores Boolean values for a consecutive range of values in a small amount of memory. E.g. the range from 1 to 8000 has 8000 possible values. These can be represented as 8000 bits in just 1000 bytes. For each value, the location of the corresponding bit can be computed by dividing the value by 8; the dividend is the location and the remainder determines which of the bits to use. The value of that specific bit can then be tested, or it can be set to false (zero) or true (one).
The bitmap is named Opt_Bitmap1005 in this case. This name is not exposed in the quick property popup, but you can find in in the full property sheet as shown here, in the Defined Values property. You will also note that this bitmap is not included in the Output List property. That’s because this bitmap is not created for each individual row; there is a single bitmap that accumulates information from all rows. Other operators in the execution plan can reference this bitmap by name, even though it is not passed to them in their input rows.
Hugo goes into great detail on the operator, so you’ll want to set aside some time to read this.
The query plan shows us that we got a Trivial Plan, and that Simple Parameterization was attempted (as shown by the 100000 literal turning into the @1 parameter.)
Simple Parameterization is only attempted in a Trivial Plan.
The key word here is attempted.
Normally, you spot the change to your query string, you go in to the properties and you see both (and it has to be both) a Parameter Compiled Value and a Parameter Runtime Value, you’ve got simple parameterization going on.
Or do you?
Notice the final property on the sheet, StatementParameterizationType. Honestly, I never really paid attention to that property. I knew what kind of parameterization I was seeing. I’m not running Forced Parameterization. This isn’t a parameterized query. It’s Simple Parameterization. Of course it is. All the keys are there. Change to the code. Parameter List values. Done.
Narrator voice: it wasn’t done.
The purpose is to minimize random access impact. If we perform an Index Seek (with a partial scan, probably) we read the entries in the index order, in our case, in the order of CustomerID, which is clearly seen on the first result set. The index on CustomerID does not cover our query, so we have to ask the clustered index for the column SomeData, and actually, we perform one another seek, seeking by the SalesOrderID column. This is a random seek, so what if, before searching by the SalesOrderID we will sort by that key, and then issue an ordered sequence of Index Seeks, turning the random acces into the sequential one, wouldn’t it be more effective?
Yes, it would in some cases, and that is what “optimized” property tells us about. However, we remember, that it is not necessarily leads to the real reordering. As for comparing the real impact, I will refer you to the actual Craig’s post or leave it as a homework.
Read the whole thing. This is one reason why it’s important to emphasize that in SQL, you can only assume order if you have an explicit ORDER BY clause.
SQL is a declarative language: you write code that specifies *what* data to get, not *how* to get it.
Basically, the SQL Server query optimizer takes your SQL query and decides on its own how it thinks it should get the data.
It does this by using precalculated statistics on your table sizes and data contents in order to be able to pick a “good enough” plan quickly.
I like this post. It also lets me push one of my favorite old-time performance tuning books, SQL Tuning by Dan Tow. 95+ percent of the time, you don’t need to think about join order. But when you do, you want to have a systematic method of figuring the ideal join order out.
Recently we were doing a project that heavily focused on query tuning and many tables had various outer joins. My co-worker pointed out that many of these could be converted to an apply rather than a join.
Apply gives you both CROSS and OUTER. Think of CROSS APPLY like an INNER JOIN and OUTER APPLY like an OUTER JOIN.
Let’s compare some code to see how APPLY stacks up.
I like the APPLY operator so much that I created an entire presentation on it. It’s not a cure-all by any means, but if you understand the intent, you can find places where it improves your code significantly.
Here we see a Seek Predicate that looks for OrderDate values between two values that have been worked out elsewhere in the plan, but creating a range in which the right values must exist. This isn’t >= 20110805 00:00 and < 20110806 00:00 (which is what I would’ve made it), it’s something else. The value for start of this range must be smaller than 20110805 00:00, because it’s >, not >=. All we can really say is that when someone within Microsoft implemented how the QO should respond to this kind of predicate, they gave it enough information to come up with what I call a “helper predicate.”
Now, I would love Microsoft to make more functions sargable, but that particular request was Closed long before they retired Connect.
But maybe what I mean is for them to make more helper predicates.
The problem with helper predicates is that they almost certainly read more rows than you want. But it’s still way better than looking through the whole index.
Read the whole thing.