Comparing Query Store Plans

Arun Sirpal explains how to compare two plans in the Query Store:

A small but nice little feature I have been using recently can be found within Query Store.

Let’s say you have 2 Plan IDs for a query, naturally you want to view the execution plan for the different plans. In the past I did it a manual way, by that I mean by individually clicking on the Plan ID to see the plan then moving on to the next one.

It is much easier to explain with some images.

Click through to see those images.

More With Adaptive Joins

Erik Darling continues his adaptive joins exploration with two more posts.  First, how local variables can affect the query plan:

The easiest way to look at this is to compare Adaptive Joins with literal values to the same ones using local variables. The results are a little… complicated.

Here are three queries with three literal values. In my copy of the Super User database (the largest Stack Overflow sub-site), I’ve made copies of all the tables and added Clustered ColumnStore indexes to them. That’s the only way to get Adaptive Joins at this point — Column Store has to be involved somewhere along the line.

The last day of data in this dump is from December 11. When I query the data, I’m looking at the last 11 days of data, the last day of data, and then a day where there isn’t any data.

Then Erik takes on non-SARGable queries:

The queries with non-SARGable predicates on the Users table used Adaptive Joins.

The queries with non-SARGable predicates on the Posts table did not.

Now, there is an Extended Events… er… event to track this, called adaptive_join_skipped, however it didn’t seem to log any information for the queries that didn’t get Adaptive Joins.

Bummer! But, if I had to wager a guess, it would be that this happens because there’s no alternative Index Seek plan for the Posts table with those predicates. Their non-SARGableness takes that choice away from the optimizer, and so Adaptive Joins aren’t a choice. The Users table is going to get scanned either way — that’s the nature of ColumnStore indexes, so it can withstand the misery of non-SARGable predicates in this case and use the Adaptive Join.

Two more good posts in Erik’s series, and both definitely worth reading.

Handling Expiring Encryption Keys

Ed Leighton-Dick explains how safely to replace a SQL Server certificate which is about to expire:

So, now that we know what we need to rotate, how do we do it?

First, obtain a new certificate. SQL Server has the capability to generate its own certificates. For many purposes, that’s enough. However, if your company has to comply with auditing or regulatory requirements, you may need to obtain the new certificate from an outside source. Often, this is a third-party certificate authority. Some companies use a system called Encryption Key Management (EKM, also known as a Hardware Security Module, or HSA, after the device used to store the master key). (Obtaining an external certificate is a subject for an upcoming post.)

However you obtained the certificate, install it. Make sure to back it up securely, including the private key.

Next, add the new certificate to the symmetric key. The ALTER CERTIFICATE command has a clause that does just that – ADD ENCRYPTION BY.

Finally, remove the old certificate from the symmetric key. You’ll again use ALTER CERTIFICATE, but this time with the DROP ENCRYPTION BY clause.

Click through for instructions, including scripts.  Ed also explains how to update the certificate used with Transparent Data Encryption.

The Value Of Log Shipping

Robert Davis explains that log shipping can be better than mirroring for database migrations:

This topic has come up several times recently, so I feel the need to blog on it. As the person who wrote the book on Database Mirroring, it will probably come as a surprise to many of you that I believe that log shipping is a much better tool for database migrations than database mirroring.

I’m not just talking about the fact that database mirroring is deprecated (since SQL Server 2012) and log shipping is not. Both are still in SQL Server to this day. Because database mirroring is deprecated, it is no longer receiving bug fixes (except maybe critical security bugs) and no work is being done to make sure that it works with new features in current and future versions. Log shipping is still receiving both of these things. I will lay out the real reasons below.

Robert makes two compelling arguments in favor of log shipping.

The Bayesian Trap

David Smith links to a video describing an application of Bayes’s Theorem and gives the example of medical tests:

If you get a blood test to diagnose a rare disease, and the test (which is very accurate) comes back positive, what’s the chance you have the disease? Well if “rare” means only 1 in a thousand people have the disease, and “very accurate” means the test returns the correct result 99% of the time, the answer is … just 9%. There’s less than a 1 in 10 chance you actually have the disease (which is why doctor will likely have you tested a second time).

Now that result might seem surprising, but it makes sense if you apply Bayes Theorem. (A simple way to think of it is that in a population of 1000 people, 10 people will have a positive test result, plus the one who actually has the disease. One in eleven of the positive results, or 9%, actually detect a true disease.)

This goes to sensitivity/recall (in the medical field, they call it sensitivity; in the documents world and in the Microsoft ML space, they call it recall):  True positives / (True positives + False negatives).  Supposing a million people, 1000 will have the disease.  Of those 1000, we expect the test to find 990 (99%).  Of the 999,000 people who don’t have the disease, we expect the test to produce 9990 false negatives (1%).  990 / (990 + 9990) = 9%.

SOS_UnfairMutexPair

Ewald Cress talks about SOS_UnfairMutexPair:

The focal point of the mutex’s state – in fact one might say the mutex itself – is the single Spinlock bit within the 32-bit lock member. Anybody who finds it zero, and manages to set it to one atomically, becomes the owner.

Additionally, if you express an interest in acquiring the lock, you need to increment the WaiterCount, whether or not you managed to achieve ownership at the first try. Finally, to release the lock, atomically set the spinlock to zero and decrement the WaiterCount; if the resultant WaiterCount is nonzero, wake up all waiters.

Now one hallmark of a light-footed synchronisation object is that it is very cheap to acquire in the non-contended case, and this class checks that box. If not owned, taking ownership (the method SOS_UnfairMutexPair::AcquirePair()) requires just a handful of instructions, and no looping. The synchronisation doesn’t get in the way until it is needed.

However, if the lock is currently owned, we enter a more complicated world within the SOS_UnfairMutexPair::LongWait() method.

I love the statement that “This is not a very British class at all.”  Read the whole thing.

Selecting Into A Filegroup

Denis Gobo points out a new feature in SQL Server 2017:

With the CTP2 build of SQL Server 2017, you now have the ability of doing a SELECT INTO operation into a specific filegroup.

The syntax looks like this

SELECT * INTO TableName ON FileGroup FROM SomeQuery

What is the use of this you might ask? Well maybe you have some ad-hoc queries where you save some data but you don’t want it sitting on your expensive SAN. Or maybe you populate staging tables on the fly and you want it to end up on a specific SSD because you need the speed of the SSD disk for these operations.

This might also be helpful in migrating tables to different storage.

Monthly Job Run Time Averages

Tywan Terrell has a script to see how his monthly SQL Agent jobs are performing in terms of average run time:

Sometime as a ETL developer or Database Administrator you will need to gain insight into SQL Agent job executions times. This  insight can be used to proactively monitor the processing times of the various jobs running within your data environment.

Information about jobs execution times is stored in the MSDB database in table sysjobhistory. This table has the start time and the run duration times which I have used to create a report that will show the average job start and end times by month for all jobs running on a instance of SQL Server.

This is a very useful start.  If I start counting on this data, I’d do two things:  first, save it somewhere else permanently (because you want to clear out SQL Agent job history occasionally so the GUI doesn’t choke when you try to view job history); and second, look more at percentiles, particularly 95th and 99th percentiles for frequently-running jobs.

The Continued Importance Of ETL

Kevin Feasel

2017-05-08

ETL

Andy Leonard explains that good old ETL remains vital to an organization:

A Problem

As Jen points out earlier in her Analytics Market Commoditization and Consolidation post (you should read it all – it’s awesome – like all of Jen’s posts!) many analytics solution providers share the “Same look, same marketing story, same saves time and allows users [to] avoid evil IT.”

I can hear some of you thinking, “Are you telling us analytics doesn’t work, Andy?” Goodness no. I’m telling you hype and sales strategy work in the analytics market as well as anywhere. When asked why a solution may not perform to expectations, the #1 response is “your data is not clean.”

Data engineering (think ETL specifically designed for analytics and “big data”) is the backbone behind data science.  To Andy’s point, the data engineer’s job is to get clean, context-heavy data in front of a data scientist, the same way a “classical” Business Intelligence specialist works with analysts.

Categories