Diving Into Index Scans

Hugo Kornelis explains how index scans work in SQL Server:

The logic of the Index Scan operator itself is fairly simple, but the actual actions carried out can vary hugely depending on the type of index being scanned (as defined in the Storage and IndexKind properties). Most of this logic is carried out at the level of the storage engine. Since an understanding of this is important to get a proper understanding of the performance of this operator, the actual actions carried out at the level of the storage engine will be described on this page as well.

The current version of SQL Server (2017) supports four types of index storage. The Storage property distinguishes between RowStore, ColumnStore, and MemoryOptimized; for the latter type only IndexKind further differentiates this into NonClustered and NonClusteredHash.

Scans are an important part of the database engine and knowing how they work helps us understand when they’re the right choice for the job.

Parameter Sniffing in the Wild

Erik Darling is a parameter sniffing anthropologist:

A while back, I put together a pretty good rundown of this on the DBA Stack Exchange site.

In the plan cache, it’s really hard to tell if a query is suffering from parameter sniffing in isolation.

By that I mean, if someone sends you a cached plan that’s slow, how can you tell if it’s because of parameter sniffing?

Read on to see what Erik does to discover parameter sniffing problems.

Power BI Performance Analyzer

Marco Russo takes us through the Power BI Performance Analyzer:

The Power BI Performance Analyzer is a feature included in the May 2019 release of Power BI Desktop that simplifies the way you can collect the DAX queries generated by Power BI. You can use DAX Studio to capture them (as described in Capturing Power BI queries using DAX Studio), but the Performance Analyzer integrated in Power BI is simpler and provides a few insights about the time consumed in other activities, such as the rendering time of any visuals.

You can enable the Power BI Performance Analyzer by clicking the Performance Analyzer checkbox in the View ribbon of Power BI Desktop.

Read the whole thing.

The Cost of Ad Hoc Queries

Erin Stellato gives us a simple demonstration of why parameterization is important for performance:

From this screenshot you can see that we have about 3GB total dedicated to the plan cache, and of that 1.7GB is for the plans of over 158,000 adhoc queries. Of that 1.7GB, approximately 500MB is used for 125,000 plans that execute ONE time only. About 1GB of the plan cache is for prepared and procedure plans, and they only take up about 300MB worth of space. But note the average use count – well over 1 million for procedures. In looking at this output, I would categorize this workload as mixed – some parameterized queries, some adhoc.

Kimberly’s blog post discusses options for managing a plan cache filled with a lot of adhoc queries. Plan cache bloat is just one problem you have to contend with when you have an adhoc workload, and in this post I want to explore the effect it can have on CPU as a result of all the compilations that have to occur. When a query executes in SQL Server, it goes through compilation and optimization, and there is overhead associated with this process, which frequently manifests as CPU cost. Once a query plan is in cache, it can be re-used. Queries that are parameterized can end up re-using a plan that’s already in cache, because the query text is exactly the same. When an adhoc query executes it will only re-use the plan in cache if it has the exact same text and input value(s).

Read on to see an example of how long it takes a set of ad hoc queries to finish versus their parameterized equivalents. Erin’s test is at the behavioral extreme (100% parameterized versus 100% ad hoc) so real-world results won’t be quite this good.

Minimal Logging into Empty Clustered Indexes

Paul White explains how to perform minimal logging when using the INSERT..SELECT pattern to insert into an empty table with a clustered index:

The summary top row suggests that all inserts to an empty clustered index will be minimally logged as long as TABLOCK and ORDER hints are specified. The TABLOCK hint is required to enable the RowSetBulk facility as used for heap table bulk loads. An ORDER hint is required to ensure rows arrive at the Clustered Index Insert plan operator in target index key order. Without this guarantee, SQL Server might add index rows that are not sorted correctly, which would not be good.

Unlike other bulk loading methods, it is not possible to specify the required ORDER hint on an INSERT...SELECT statement. This hint is not the same as using an ORDER BY clause on the INSERT...SELECT statement. An ORDER BY clause on an INSERTonly guarantees the way any identity values are assigned, not row insert order.

Read on to see what you can do.

Blocked Queries and the Plan Cache

Erik Darling lays out an argument why execution time isn’t always a great measure for finding bad queries:

Most scripts (even ones I’ve worked on!) that look at the plan cache, have had the ability to sort it by different metrics: CPU, reads, writes, duration, etc.

A lot of people are very interested in long running queries — and I am too!

Heck, they’re how I make money. Blogging pays like crap.

But there’s a slight problem with only looking at query duration.

Read on for an example. Erik loses money on every blog post but he makes up for it in volume.

Aggregate Pushdown with GROUP BY

Paul White takes us through several performance improvements around aggregate pushdown:

SQL Server 2016 introduced serial batch mode processing and aggregate pushdown. When pushdown is successful, aggregation is performed within the Columnstore Scan operator itself, possibly operating directly on compressed data, and taking advantage of SIMD CPU instructions.

The performance improvements possible with aggregate pushdown can be very substantial. The documentation lists some of the conditions required to achieve pushdown, but there are cases where the lack of ‘locally aggregated rows’ cannot be fully explained from those details alone.

This article covers additional factors that affect aggregate pushdown for GROUP BY queries onlyScalar aggregate pushdown (aggregation without a GROUP BY clause), filter pushdown, and expression pushdown may be covered in a future post.

Read the whole thing.

Power Query Container Size and Performance

Chris Web looks into what changing the Power BI Dataflow container size does for us:

Currently there is no way to change this 256MB in Power BI Desktop or Excel although someone has already posted a suggestion on the Ideas site to allow us to change it. How much of an impact does this actually have on refresh performance though? Without the ability to change this setting it’s hard to say, but I suspect it could be significant and that a lot of Power Query performance problems could be explained by this behaviour.

The situation is different in the Power BI service, where I understand there is a limit on the overall amount of memory that a single Power Query query evaluation can use.

Read on to understand the differences here between running on Power BI Desktop and running in the Power BI service, as well as a bit of testing on Chris’s part.

Load Testing Tools For SQL Server

Brent Ozar shares a list of load testing tools for SQL Server:

One thing I need you to understand first: you have to provide the database and the queries. Almost all of the tools in this post, except the last one, are designed to help you run queries, but they don’t include the queries. The whole idea with load testing is that you’re trying to mimic your own workloads. If you’re just trying to test a server with generic workloads, start with my post, “How to Check Performance on a New SQL Server.”

Click through for a list of tools. I’d also throw in Pigdog from Mark Wilkinson (one of my co-workers). This helped replicate a few issues in SQL Server 2017 around tempdb performance.

Troubleshooting Spark Performance

Bikas Saha and Mridul Murlidharan explain some of the basics of performance tuning with Apache Spark:

Our objective was to build a system that would provide an intuitive insight into Spark jobs that not just provides visibility but also codifies the best practices and deep experience we have gained after years of debugging and optimizing Spark jobs. The main design objectives were to be
– Intuitive and easy – Big data practitioners should be able to navigate and ramp quickly
– Concise and focused – Hide the complexity and scale but present all necessary information in a way that does not overwhelm the end user
– Batteries included – Provide actionable recommendations for a self service experience, especially for users who are less familiar with Spark
– Extensible – To enable additions of deep dives for the most common and difficult scenarios as we come across them

The tool looks pretty interesting and I’m hoping it will be part of the open source suite at Cloudera.


August 2019
« Jul