Category: Performance Tuning

sp_executesql Performance

Published 2016-11-08 by Kevin Feasel

Grant Fritchey takes a look at the performance of sp_executesql:

The results are fun.

Execution Type Average Duration

sp_executesql AVG: 57946.03187251

Ad Hoc AVG: 14788.8924302789

What’s going on? Is the conclusion that, in fact, ad hoc queries are faster than sp_executesql?

Absolutely not.

I cheated.

Using sp_executesql is about protecting yourself when executing dynamic SQL, not about performance gains.

Comments closed

Wait Stats

Published 2016-11-03 by Kevin Feasel

David Alcock provides an introduction to wait stats and why they’re useful for performance tuning:

So here are two different ways that we can use SQL Servers wait statistics for troubleshooting purposes. Both views give us really useful information but both have different purposes. If we wanted to look back over time then the sys.dm_os_wait_stats will give us a view of wait time totals. Typically we would capture the information via a scheduled job and analyse the data for spikes during periods where issues might be suspected.

For performing real-time analysis of wait statistics then we should base queries on the sys.dm_os_waiting_tasks view where we can see accurate wait duration values as they are happening within our instance.

In my opinion wait statistics are the most important piece of information when troubleshooting SQL Server so learning about the different types is vital for anyone using SQL. Thankfully there is a wealth of really useful information about wait statistics out there; I’ve listed some of my favourite posts below.

Click through for an example, as well as links to more resources.

Comments closed

Latency Vs Throughput

Published 2016-11-02 by Kevin Feasel

I have a post up on understanding latency versus throughput:

The primary method in which data moves from one process to another is through buffers. We break up data into smaller portions and push them to their destination. In Integration Services, we have buffers. When passing data through TCP, we use packets.

Okay, so what’s the trade-off? The trade-off is between latency and throughput. Let’s take TCP packets as an example. Say you have a series of 50-byte messages you want to send from a source to a destination. We have two primary options: push messages as fast as possible, or hold off until you have the most data you can store in a packet and send it along. For simplicity’s sake, we’ll say that we can fit about 1350 bytes in a packet, so we can store 27 messages in a packet. We’ll also assume that it takes 10 milliseconds to send a packet from the source to the destination (regardless of packet size, as we’re using powerful connections) and 1 millisecond to produce a message.

We use pipes as metaphors in IT, especially around data transfer, and I think it’s a solid metaphor because it intuitively includes most of the important concepts we need to worry about with data. We have latency (how long it takes something to go from one end of the pipe to the other), throughput (how much we can move at any point in time, which is determined by things like the diameter of the pipe), back pressure (in the pipe scenario, resistance caused by pipe directional changes; in the data world, when downstream operators are slower than upstream operators), etc.

Comments closed

View Performance

Published 2016-11-02 by Kevin Feasel

Grant Fritchey looks at view performance vis-a-vis stored procedures:

The difference in the performance including compile time for the procedure alone is 700mc better on average than the view. That’s an 8% difference. It was almost that high for the view that used the procedure at 7%.

If we’re just talking compile time then, there is a significant win if we avoid the view. This is no doubt because of the extra work involved in unpacking the view and going through the simplification process within the optimizer. Plus, the view alone in our query was parameterized by the optimizer in order to assist it’s performance over time (as we saw in the average results without the recompile). All that extra work explains the 8% difference.

Read the whole thing.

Comments closed

Subqueries And Performance

Published 2016-10-25 by Kevin Feasel

Grant Fritchey busts a myth:

I’ve written before about the concept of cargo cult data professionals. They see one issue, one time, and consequently extrapolate that to all issues, all the time. It’s the best explanation I have for why someone would suggest that a sub-query is flat out wrong and will hurt performance.

Let me put a caveat up front (which I will reiterate in the conclusion, just so we’re clear), there’s nothing magically good about sub-queries just like there is nothing magically evil about sub-queries. You can absolutely write a sub-query that performs horribly, does horrible things, runs badly, and therefore absolutely screws up your system. Just as you can with any kind of query. I am addressing the bad advice that a sub-query is to be avoided because they will inherently lead to poor performance.

There are times not to use subqueries, but this post is absolutely correct: understand the reasons why things may or may not perform well, and don’t be afraid to try things out.

Comments closed

Text-Based Execution Plans

Published 2016-10-13 by Kevin Feasel

Erik Darling looks at the old SET STATISTICS PROFILE command:

Before you think this is to perf tuning what boxed wine is to pest extermination; it’s not. It’s another tool that has pros and cons. The plan cache is cool too, but cached plans don’t have all the information that actual plans do. You can run Traces or Profiler or Extended Events, but they all sort of have their own caveats, gotchas, and overhead. If you don’t have a monitoring tool, though, what are you left with?

Let’s take a look at what you can do with STATISTICS PROFILE, and then the (rather obvious) limitations. Here’s the setup and a simple query.

I’ll admit that outside of learning what they are, I’ve never used text execution plans. I’ll read the XML, view the graphical results, pipe them out to SentryOne Plan Explorer (formerly SQL Sentry), etc. But the text plans never held much allure for me.

Comments closed

Sequentially Increasing Indexes

Published 2016-10-10 by Kevin Feasel

Joe Chang discusses benchmarking and looks at a particular scenario around maximizing insert performance:

The test environment here is a single socket Xeon E3 v3, quad-core, hyper-threading enabled. Turbo-boost is disabled for consistency. The software stack is Windows Server 2016 TP5, and SQL Server 2016 cu2 (build 2164). Some tests were conducted on a single socket Xeon E5 v4 with 10 cores, but most are on the E3 system. In the past, I used to maintain two-socket systems for investigating issues, but only up to the Core2 processor, which were not NUMA.

The test table has 8 fixed length not null columns, 4 bigint, 2 guids, 1 int, and a 3-byte date. This adds up to 70 bytes. With file and row pointer overhead, this works out to 100 rows per page at 100% fill-factor.

Both heap and clustered index organized tables were tested. The indexes tested were 1) single column key sequentially increasing and 2) two column key leading with a grouping value followed by a sequentially increasing value. The grouping value was chosen so that inserts go to many different pages.

The test was for a client to insert a single row per call. Note that the recommended practice is to consolidate multiple SQL statements into a single RPC, aka network roundtrip, and if appropriate, bracket multiple Insert, Update and Delete statements with a BEGIN and COMMIT TRAN. This test was contrived to determine the worst case insert scenario.

With that setup in mind, click through to learn his results.

Comments closed

Comparing File Formats In Hadoop

Published 2016-10-03 by Kevin Feasel

Andrew Peterson points out performance comparisons for various Hadoop file formats:

According to a posting on the Hortonworks site, both the compression and the performance for ORC files are vastly superior to both plain text Hive tables and RCfile tables. For compression, ORC files are listed as 78% smaller than plain text files. And for performance, ORC files support predicate pushdown and improved indexing that can result in a 44x (4,400%) improvement. Needless to say, for Hive, ORC files will gain in popularity. (you can read the posting here: ORC File in HDP 2: Better Compression, Better Performance).

There are several considerations around picking the correct file format, and it’s probably best to experiment with them in your specific environment.

Comments closed

SQL Server 2016 CU2 And Spinlocks

Published 2016-09-30 by Kevin Feasel

Arvind Shyamsundar has a nice post on SQL Server 2016’s Cumulative Update 2 and how it reduces spinlock contention:

The developers then took a long hard look at how to make this more efficient on such large systems. As described before, since operations on the cache are read-intensive, there was a thought to leverage reader-writer primitives to optimize locking. However, any changes to this spinlock had to be validated extensively before releasing publicly as they may have a drastic impact if incorrectly implemented.

The implementation of the reader / writer version of this spinlock was an intricate effort and was done carefully to ensure that we do not accidentally affect any other functionality. We are glad to say that the final outcome, of what started as a late night investigation in the SQLCAT lab, has finally landed as an improvement which you can use! If you download and install Cumulative Update 2 for SQL Server 2016 RTM, you will observe two new spinlocks in the sys.dm_os_spinlock_stats view:

LOCK_RW_CMED_HASH_SET

LOCK_RW_SECURITY_CACHE

These are improved reader/writer versions of the original spinlocks. For example, LOCK_RW_CMED_HASH_SET is basically the replacement for CMED_HASH_SET, the spinlock which was the bottleneck in the above case.

Click through for the full story.

Comments closed

SSAS And Power BI Performance Issue

Published 2016-09-27 by Kevin Feasel

Chris Webb describes an issue with SSAS Multidimensional and Power BI-generated DAX causing a performance problem:

This query has something in it – I don’t know what – that means that it cannot make use of the Analysis Services Storage Engine cache. Every time you run it SSAS will go to disk, read the data that it needs and then aggregate it, which means you’ll get cold-cache performance all the time. On a big cube this can be a big problem. This is very similar to problems I’ve seen with MDX queries on Multidimensional and which I blogged about here; it’s the first time I’ve seen this happen with a DAX query though. I suspect a lot of people using Power BI on SSAS Multidimensional will have this problem without realising it.

This problem does not occur for all tables – as far as I can see it only happens with tables that have a large number of rows and two or more hierarchies in. The easy way to check whether you have this problem is to refresh your report, run a Profiler trace that includes the Progress Report Begin/End and Query Subcube Verbose events (and any others you find useful) and then refresh the report again by pressing the Refresh button in Power BI Desktop without changing it at all. In your trace, if you see any of the Progress Report events appear when that second refresh happens, as well as Query Subcube Verbose events with an Event Subclass of Non-cache data, then you know that the Storage Engine cache is not being used.

This doesn’t look to be a quick fix, so do read the whole thing to help figure out how to avoid this issue.

Comments closed

Execution Type	Average Duration
sp_executesql	AVG: 57946.03187251
Ad Hoc	AVG: 14788.8924302789