There’s a significant performance difference, so if you work frequently with regular expressions, check this out.
Very often when I mention testing before an upgrade, I’m told that there is no environment in which to do the testing. I know some of you have a Test environment. Some of you have Test, Dev, QA, UAT and who knows what else. You’re lucky.
For those of you that state you have no test environment at all in which to test, I give you
DBCC CLONEDATABASE. With this command, you have no excuse to not run the most frequently-executed queries and the heavy-hitters against a clone of your database. Even if you don’t have a test environment, you have your own machine. Backup the clone database from production, drop the clone, restore the backup to your local instance, and then test. The clone database takes up very little space on disk and you won’t incur memory or I/O contention as there’s no data. You will be able to validate query plans from the clone against those from your production database. Further, if you restore on SQL Server 2016 you can incorporate Query Store into your testing! Enable Query Store, run through your testing in the original compatibility mode, then upgrade the compatibility mode and test again. You can use Query Store to compare queries side by side! (Can you tell I’m dancing in my chair right now?)
Erin’s discovery makes CLONEDATABASE go from being an interesting tool to being outright powerful for handling upgrades.
So 23 milliseconds for the batch version and 850 milliseconds for RBAR. What a difference.
Now in this case the code for the RBAR is also a lot more complicated. But that isn’t always the case. It also isn’t always the case that RBAR is slower. But it’s almost always a lot slower than batch.
So, while the code for RBAR is often easier to write, even though it might be physically longer, it’s probably going to be slower too.
Well-written, set-based solutions aren’t always guaranteed to be faster, but that’s one of the safest bets to make with T-SQL.
Start writing queries that demonstrate TSQL anti patterns – and make them slow
You know how people say that the best way to learn something is to teach it?
The best way to learn to speed up queries is to write slow ones.
The best way to get a job speeding up queries is to write a blog about the queries you’ve sped up.
This is a long-term learning process, but is absolutely a worthwhile skill for any database professional.
When I work with SQL Server batch-controlled workflows, I use the theory “feed the CPUs”. That’s the simplest positive adaptation I could come up with of Kevin Closson’s paradigm “Everything is a CPU problem” 🙂
What I mean by “Feed the CPUs” is that memory and disk response times are primary factors determining the maximum rate for the CPUs to process the data. Nuts & bolts of such a model for SQL Server are slightly different than a similar model for Oracle. SQL Server access to persistent data is always through database cache, while Oracle uses shared access to database cache in SGA and private access to persistent data through direct access in PGA.
Click through for more details.
LOB values are especially problematic when a trace captures the RPC completed event of a TVP query. Tracing uses memory from the OBJECTSTORE_LBSS memory pool to build trace records that contain TVP LOB values. From my observations of the sys.dm_os_memory_clerks DMV, each LOB cell of a TVP requires about 8K during tracing regardless of the actual value length. This memory adds up very quickly when many rows and lob columns are passed via a TVP with a trace running. For example, the 10,000 row TVP with 10 LOB columns used in the earlier test required over 800MB memory for a single trace record. Consider that a large number of TVP LOB cells and/or concurrent TVP queries can cause queries to fail with insufficient memory errors. In extreme cases, the entire instance can become unstable and even crash under due to tracing of TVP queries.
This is a must-read if you use TVPs in your environment.
For row store targets, it is important to note that the presence of a clustered index or any additional non-clustered indexes on the target table will disable the parallel INSERT behavior. For example, here is the query plan on the same table with an additional non-clustered index present. The same query takes 287 seconds without a TABLOCK hint and the execution plan is as follows
This post goes into detail on when you can expect parallelism in rowstore and columnstore insertions. I highly recommend reading it.
As can be seen in the logs, the localityWaitFactor value is 1, but the delay that this code causes grows linearly with the number of required containers. Since our DFSIO-large benchmark creates 1,024 files, each 1 GB in size, it requests 1,024 YARN containers. Therefore, the code has to miss at least 1,024 scheduling opportunities until it schedules containers on this (wrongly assumed) OFF_SWITCH node.
But why is this delay enforced? This idea falls into a big area of scheduling research. The Delay Scheduling algorithm was introduced by Matei Zaharia’s EuroSys ’10 paper titled “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling”.
That post is a bit deeper than my Hadoop administration comfort level, but if you’re given the task of performance tuning a cluster, this might be one place to look.
It’s not enough to know that you have a slow query or queries. You need to know exactly how slow they are. You must measure. You need to know how long they take to run and you need to know how many resources are used while they run. You need to know these numbers in order to be able to determine if, after you do something to try to help the query, you’ll know whether or not you’ve improved performance. To measure the performance of queries, you have a number of choices. Each choice has positives and negatives associated with them. I’m going to run through my preferred mechanisms for measuring query performance and outline why. I’ll also list some of the other mechanisms you have available and tell you why I don’t like them. Let’s get started.
This is an intro-level blog post, so Grant doesn’t go into much detail, but he does provide some good links for getting started.
The cluster’s servers and SQL Server configurations were built to be as close to identical as possible to the previous instance (memory, cores, disk, maxdop, CTP, etc).
After the migration, I noticed that CPU utilization jumped from the normal 25% to a consistent 75%.
I did several other migrations with similar server loads with no issues, so I’m a bit puzzled as to what might be going on here. Could the upgrade from SQL Server 2008 R2 to SQL Server 2012 simply be exposing bad queries that 2008 was handling differently?
Kendra goes through a number of reasons, building a troubleshooting guide in the process. This is a great read.