The Bitmap Operator

Hugo Kornelis describes a new operator:

The Bitmap operator is used to build a bitmap that, based on a hash, represents which values may be present in a data flow. Due to the chance of hash collisions in the hash function used, the Bitmap process can produce false positives but not false negatives – so a match based on a bitmap is not guaranteed to be a match to the actual data, but a non-match based on a bitmap is guaranteed to not be a match in the actual data.
The generated bitmap is typically used in other operators to remove rows for which there is no match in the bitmap, and hence guaranteed no match in the original set of data processed by the Bitmap operator. The use of Bitmap operators is most common in execution plans for star join queries in large data warehouses. An example can be seen here.

Click through for details on how it works and plenty of good information on it.

Visualizing Merge Joins

Bert Wagner continues his series on physical join operators:

The base algorithm works as follows: SQL Server compares the first rows from both sorted inputs.  It then continues comparing the next rows from the second input as long as the values match the first input’s value.
Once the values no longer match, SQL Server increments the row of whichever input has the smaller value – it then continues performing comparisons and outputting any joined records. (For more detailed information, be sure to check out Craig Freedman’s post on merge joins.)

Beware:  Bert has animations which might cause you to space out for a few hours.

Understanding Analysis Services Memory Behavior

Shabnam Watson walks us through SQL Server Analysis Services memory settings and application behavior under memory stress:

If memory consumption is below the Low limit everything is fine and it is free to stay in memory. Once the consumption passes the Low limit a cleaner thread wakes up and tries to clean up memory. At this point price of memory is no longer zero. It starts from 2 at the Low limit and goes as high as 1000 when memory consumption reaches the Total limit. The higher the memory pressure the more aggressive the cleaner gets. Once memory consumption reaches the Hard limit all connections/sessions are closed and queries are cancelled with an out of memory error.

This is a thorough explanation with some good demos and terrible queries.  Give it a read.

How SQL Server Implements Index Spooling

Hugo Kornelis has a long article which dives into the way SQL Server handles index spooling:

A Table Spool operator stores its data in a worktable that is structured as a clustered index. The index is not built on any of the columns in the data, nor on any artificially added columns. It is structured on zero columns. As is normal for a clustered index on a set of columns that is not unique for the set, a 4-byte uniqueifier is then added to the data to give each row a unique internal address.

The worktable for an Index Spool operator is also structured as a clustered index. However, this operator does actually index actual columns from its data instead of just relying on a uniqueifier. The indexed columns are chosen to effectively satisfy the Seek Predicate property. The statement in the Microsoft’s documentation that a nonclustered index is used for Index Seek is not correct.

A stack spool is represented in execution plans as a combination of an Index Spool and a Table Spool, both with the With Stack property present and set to True. This is misleading because it is actually a different type of spool. The worktable it uses is built as a clustered index on a single column, representing the nesting level. Because this is not unique, a uniqueifier is added where needed.

This is a deep look at some operators which people tend to gloss over but can have huge performance impacts.

Understanding Hash Match Aggregates

Itzik Ben-Gan continues his series on grouping and aggregating data by looking at the hash match aggregation process:

The estimated CPU cost for the Hash Aggregate in the plan for Query 8 is 0.166344, and in Query 9 is 0.16903.

It could be an interesting exercise to try and figure out exactly in what way the cardinality of the grouping set, the data types, and aggregate function used affect the cost; I just didn’t pursue this aspect of the costing. So, after making a choice of the grouping set and aggregate function for your query, you can reverse engineer the costing formula. For example, let’s reverse engineer the CPU costing formula for the Hash Aggregate operator when grouping by a single integer column and returning the MAX(orderdate) aggregate. The formula should be:

Operator CPU cost = <startup cost> + @numrows * <cost per row> + @numgroups * <cost per group>

Using the techniques that I demonstrated in the previous articles in the series, I got the following reverse engineered formula:

Operator CPU cost = 0.017749 + @numrows * 0.00000667857 + @numgroups * 0.0000177087

Definitely worth reading in detail.

Row Width And Snapshot Isolation

Kendra Little shows us the impact that row width has on snapshot isolation:

So I went to work to demonstrate row width impact on the version store — when only a tiny bit column is changed in the row.

Here’s how I did the test:

  • I created two tables, dbo.Narrow and dbo.Wide. They each each have a bit column named bitsy, along with some other columns.
  • I inserted one row in each table, but I put a lot more data into the row in dbo.Wide.
  • I allowed snapshot isolation on the database
  • I began a transaction in another session under snapshot isolation and left the transaction open (so version store cleanup wouldn’t kick in while I looked around)
  • I updated the bit column named bitsy for the single row in each table, thereby generating a row-version in tempdb for each table

The code I ran to test this is here, if you’d like to play around with it.

Read on for the results.

Digging Into The SQL Compute Context With R Services

Niels Berglund dives into how the SQL Compute Context works with R Services:

In the code above we use the RxInSqlServer() function to indicate we want to execute in a SQL context. The connectionString property defines where we execute, and the numTasks property sets the number of tasks (processes) to run for each computation, in Code Snippet 4 it is set to 1 which from a processing perspective should match what we do in Code Snippet 3. Before we execute the code in Code Snippet 4 we do what we did before we ran the code in Code Snippet 3:

  • Run Process Explorer as admin.
  • Navigate to the devenv.exe process in Process Explorer.
  • In addition, also look at the Launchpad.exe process in Process Explorer.

When we execute we see that the BxlServer.exe processes under the Microsoft.R.Host.exe processes are idling, but when we look at the Launchpad.exe process we see this:

This is a bit deep but interesting reading.

Nested Loops And Implicit Reordering

Dmitry Piliugin shows how the SQL Server optimizer can end up reordering data in a nested loops join to improve performance:

The purpose is to minimize random access impact. If we perform an Index Seek (with a partial scan, probably) we read the entries in the index order, in our case, in the order of CustomerID, which is clearly seen on the first result set. The index on CustomerID does not cover our query, so we have to ask the clustered index for the column SomeData, and actually, we perform one another seek, seeking by the SalesOrderID column. This is a random seek, so what if, before searching by the SalesOrderID we will sort by that key, and then issue an ordered sequence of Index Seeks, turning the random acces into the sequential one, wouldn’t it be more effective?

Yes, it would in some cases, and that is what “optimized” property tells us about. However, we remember, that it is not necessarily leads to the real reordering. As for comparing the real impact, I will refer you to the actual Craig’s post or leave it as a homework.

Read the whole thing.  This is one reason why it’s important to emphasize that in SQL, you can only assume order if you have an explicit ORDER BY clause.

What Read Committed Isolation Level Gets You

Paul Randal explains the answer, which is “not much”:

The ‘weird’ behavior is that when the “Batch 2” select completes, after having been blocked by the “Batch 1” transaction, it doesn’t return all 1,000 rows (even though “Batch 1” has completed). Furthermore, depending on when the “Batch 2” select is started, during the 10-seconds that “Batch 1” executes, “Batch 2” returns different numbers of rows. This behavior had also been reported on earlier versions of SQL Server as well. It’s easy to reproduce on SQL Server 2016/2017 and can be reproduced in all earlier versions with a single configuration change (more details in a moment).

Additionally, if the table has a clustered index created, 1,000 rows are returned every time, on all versions of SQL Server.

So why is this weird? Many people expect that all 1,000 rows will be returned every time AND that the structure of the table or the version of SQL Server should not make any difference.

Unfortunately, that assumption is not correct when using read committed.

Read Committed is a trade-off, not an ideal.

How The Hash Match Join Operation Works

Dmitry Piliugin explains what happens when SQL Server calls for a hash match to join two tables together:

Hash Match in the join mode consumes two inputs, as we are joining two tables. The main idea is to build the hash table using the first “build” input, and then apply the same approach hash the second “probe” input to see if there will be matches of hashed values.

Query Processor (QP) is doing many efforts while building the plan to choose the correct join order. From the Hash Match prospective, it means that QP should choose what table is on the Build side and what is on the Probe side. The Build size should be smaller as it will be stored in memory when building a hash table.

Building a hash table begins with hashing join key values of the build table and placing them to one or another bucket depending on the hash value. Then QP starts processing the probe side, it applies the same hash function to the probe values, determining the bucket and compares the values inside of the bucket. If there is a match – the row is returned.

That would be the whole story if we had infinite memory, but in the real world, it is not true. More to the point, SQL Server allocates memory to the query before the execution starts and does not change it during the execution. That means that if the allocated memory amount is much less than the data size came during the execution, a Hash Match should be able to partition the joining data, and process it in portions that fit allocated memory, while the rest of the data is spilled to the disk waiting to be processed. Here is where the dancing begins.

Read on to learn more about the details of this operation.


January 2019
« Dec