Press "Enter" to skip to content

Category: Performance Tuning

Explaining RBAR

Kenneth Fisher explains RBAR with the help of an animated GIF:

So 23 milliseconds for the batch version and 850 milliseconds for RBAR. What a difference.

Now in this case the code for the RBAR is also a lot more complicated. But that isn’t always the case. It also isn’t always the case that RBAR is slower. But it’s almost always a lot slower than batch.

So, while the code for RBAR is often easier to write, even though it might be physically longer, it’s probably going to be slower too.

Well-written, set-based solutions aren’t always guaranteed to be faster, but that’s one of the safest bets to make with T-SQL.

Comments closed

Learning Performance Tuning

Kendra Little gives some tips on how to gather performance tuning skills:

Start writing queries that demonstrate TSQL anti patterns – and make them slow

You know how people say that the best way to learn something is to teach it?

The best way to learn to speed up queries is to write slow ones.

The best way to get a job speeding up queries is to write a blog about the queries you’ve sped up.

This is a long-term learning process, but is absolutely a worthwhile skill for any database professional.

Comments closed

Feed The CPUs

SQL Sasquatch is starting a new series on optimizing disk write to maximize CPU throughput:

When I work with SQL Server batch-controlled workflows, I use the theory “feed the CPUs”.  That’s the simplest positive adaptation I could come up with of Kevin Closson’s paradigm “Everything is a CPU problem” 🙂

What I mean by “Feed the CPUs” is that memory and disk response times are primary factors determining the maximum rate for the CPUs to process the data.  Nuts & bolts of such a model for SQL Server are slightly different than a similar model for Oracle.  SQL Server access to persistent data is always through database cache, while Oracle uses shared access to database cache in SGA and private access to persistent data through direct access in PGA.

Click through for more details.

Comments closed

Table-Valued Parameter Performance

Dan Guzman shows that setting TVP column sizes correctly can have a major performance impact:

LOB values are especially problematic when a trace captures the RPC completed event of a TVP query. Tracing uses memory from the OBJECTSTORE_LBSS memory pool to build trace records that contain TVP LOB values. From my observations of the sys.dm_os_memory_clerks DMV, each LOB cell of a TVP requires about 8K during tracing regardless of the actual value length. This memory adds up very quickly when many rows and lob columns are passed via a TVP with a trace running. For example, the 10,000 row TVP with 10 LOB columns used in the earlier test required over 800MB memory for a single trace record. Consider that a large number of TVP LOB cells and/or concurrent TVP queries can cause queries to fail with insufficient memory errors. In extreme cases, the entire instance can become unstable and even crash under due to tracing of TVP queries.

This is a must-read if you use TVPs in your environment.

Comments closed

Parallel Insert-Select

Arvind Shyamsundar looks at parallel insertion using the INSERT SELECT pattern:

For row store targets, it is important to note that the presence of a clustered index or any additional non-clustered indexes on the target table will disable the parallel INSERT behavior. For example, here is the query plan on the same table with an additional non-clustered index present. The same query takes 287 seconds without a TABLOCK hint and the execution plan is as follows

This post goes into detail on when you can expect parallelism in rowstore and columnstore insertions.  I highly recommend reading it.

Comments closed

NodeGroup Performance Issues

Babak Behzad explains potential Hadoop NodeGroup performance bottlenecks:

As can be seen in the logs, the localityWaitFactor value is 1, but the delay that this code causes grows linearly with the number of required containers. Since our DFSIO-large benchmark creates 1,024 files, each 1 GB in size, it requests 1,024 YARN containers. Therefore, the code has to miss at least 1,024 scheduling opportunities until it schedules containers on this (wrongly assumed) OFF_SWITCH node.

But why is this delay enforced? This idea falls into a big area of scheduling research. The Delay Scheduling algorithm was introduced by Matei Zaharia’s EuroSys ’10 paper titled “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling”.

That post is a bit deeper than my Hadoop administration comfort level, but if you’re given the task of performance tuning a cluster, this might be one place to look.

Comments closed

Built-In Query Monitoring Tools

Grant Fritchey describes a couple built-in options for monitoring query performance:

It’s not enough to know that you have a slow query or queries. You need to know exactly how slow they are. You must measure. You need to know how long they take to run and you need to know how many resources are used while they run. You need to know these numbers in order to be able to determine if, after you do something to try to help the query, you’ll know whether or not you’ve improved performance. To measure the performance of queries, you have a number of choices. Each choice has positives and negatives associated with them. I’m going to run through my preferred mechanisms for measuring query performance and outline why. I’ll also list some of the other mechanisms you have available and tell you why I don’t like them. Let’s get started.

This is an intro-level blog post, so Grant doesn’t go into much detail, but he does provide some good links for getting started.

Comments closed

Upgrades And Regressions

Kendra Little explains when upgrades can cause performance to suffer:

The cluster’s servers and SQL Server configurations were built to be as close to identical as possible to the previous instance (memory, cores, disk, maxdop, CTP, etc).

After the migration, I noticed that CPU utilization jumped from the normal 25% to a consistent 75%.

I did several other migrations with similar server loads with no issues, so I’m a bit puzzled as to what might be going on here. Could the upgrade from SQL Server 2008 R2 to SQL Server 2012 simply be exposing bad queries that 2008 was handling differently?

Kendra goes through a number of reasons, building a troubleshooting guide in the process.  This is a great read.

Comments closed

Indexes And Stats

Brent Ozar looks at a case when adding a suggested index monkeys with stats:

The query runs faster, make no mistake – but check out the estimates:

  • Estimated number of rows = 1
  • Actual number of rows = 165,367

Those estimates are built by SQL Server’s cardinality estimator (CE), and there have been major changes to it over the last couple of versions. You can control which CE you’re using by changing the database’s compatibility level. This particular StackOverflow database is running in 2016 compat mode – so what happens if we switch it back to 2012 compat mode?

Based on this result, there might be further optimizations available.  Read on for more of Brent’s thoughts.

Comments closed

When Servers Behave Differently

Grant Fritchey tells a tale of two servers which behave quite differently when running the same query:

Now what?

Well obviously something somewhere is different. Start by comparing everything on both servers and both databases down to… hang on, here, we’ll write a PowerShell script and then….

Wait, wait, wait!

You have the execution plans? Before we start digging through all the properties everywhere and comparing everything to everything, updating statistics 14 times, and all the rest, what if we look at the execution plans. They’re different, so we should start looking at scans & indexes & statistics &….

Wait!

Sometimes, it’s the little things that matter.

Comments closed