Category: Performance Tuning

We expected the Semi Join to turn into an Anti Semi Join, but the plan now also contains a Nested Loop branch with a Row Count Spool – what’s that about? Turns out the Row Count Spool, along with its index seek, has to do with the NOT IN() and the fact that we’re looking at a nullable column. Remember that…
x NOT IN (y, z, NULL)
… always returns false, because the NULL value could represent essentially anything, including the x. And so it is with the inner table, if there happens to be a NULL value among those rows.

Definitely worth the read. There are ways around this performance hit but design decisions have consequences.

Comments closed

Returning NULL on NULL Input In UDFs

Published 2018-12-27 by Kevin Feasel

Jonathan Kehayias shows us a performance improvement you can get if your user-defined function is expected to return NULL if you pass in NULLs for inputs:

I was really curious about the RETURNS NULL ON NULL INPUT function option so I decided to do some testing. I was very surprised to find out that it’s actually a form of scalar UDF optimization that has been in the product since at least SQL Server 2008 R2.
It turns out that if you know that a scalar UDF will always return a NULL result when a NULL input is provided then the UDF should ALWAYS be created with the RETURNS NULL ON NULL INPUT option, because then SQL Server doesn’t even run the function definition at all for any rows where the input is NULL – short-circuiting it in effect and avoiding the wasted execution of the function body.

The more often you pass in NULL to that function, the better your performance will be relative to the default case.

Comments closed

Vectorization With Apache Hive And Parquet Tables

Published 2018-12-26 by Kevin Feasel

Vihang Karajgaonkar, et al, take us through using a performance improvement in Apache Hive using Parquet tables:

The performance benchmarks on CDH 6.0 show that enabling Parquet vectorization significantly improves performance for a typical ETL workload. In the test workload (TPC-DS), enabling parquet vectorization gave 26.5% performance improvement on average (geomean value of runtime for all the queries). Vectorization achieves these performance improvements by reducing the number of virtual function calls and leveraging the SIMD instructions on modern processors. A query is vectorized in Hive when certain conditions like supported column data-types and expressions are satisfied. However, if the query cannot be vectorized its execution falls back to a non-vectorized execution. Overall, for workloads which use the Parquet file format on most modern processors, enabling Parquet vectorization can lead to better query performance in CDH 6.0 and beyond.

This is worth looking into, especially if you are on the Cloudera stack.

Comments closed

Fill Factor And The Performance Tradeoff

Published 2018-12-14 by Kevin Feasel

Tara Kizer explains the performance tradeoff when setting fill factor for an index:

There are workloads where frequent page splits are a problem. I thought I had a system like this many years ago, so I tested various fill factor settings for the culprit table’s clustered index. While insert performance improved by lowering the fill factor, read performance drastically got worse. Read performance was deemed much more critical than write performance on this system. I abandoned that change and instead recommended a table design change since it made sense for that particular table.

Click through for a demo.

Comments closed

Join Estimation: Details On A Cardinality Estimator Change

Published 2018-11-26 by Kevin Feasel

Paul White shares some detail on a change to the Cardinality Estimator in SQL Server 2014:

The question is now how to perform a coarse estimation of the equijoin cardinality of the highlighted steps, using the information available.

The original cardinality estimator would have performed a fine-grained step-by-step histogram alignment using linear interpolation, assessed the join contribution of each step (much as we did for the minimum step value before), and summed each step contribution to acquire a full join estimate. While this procedure makes a lot of intuitive sense, practical experience was that this fine-grained approach added computational overhead and could produce results of variable quality.

The original estimator had another way to estimate join cardinality when histogram information was either not available, or heuristically assessed to be inferior. This is known as a frequency-based estimation[…]

It’s an interesting dive into one of the changes in 2014’s CE. The post is a little math-heavy but Paul does a great job keeping it interesting.

Comments closed

Tuning Apache Spark Applications

Published 2018-11-14 by Kevin Feasel

Vidisha Gupta has a few tips for tuning Apache Spark programs:

Data Serialization – Serialization plays an important role in increasing the performance of any application. Spark provides two serialization libraries –

Java Serialization: By default, spark uses Java’s ObjectOutputStream framework which can work with any class that implements java.io.serializable. This serialization is flexible but slow and creates large serialized formats for many classes.
Kryo Serialization: Spark can use Kryo library to serialize objects. It is much faster and compact but does not support all serializable types. So we must register those classes which we want to be serialized. Therefore, Kryo uses indices instead of full class names to identify data types which reduce the size of the serialized data thereby increasing performance. We can initialize our spark conf by setting the value of the property spark.serializer to org.apache.spark.serializer.KryoSerializer. This serializer has a major impact on performance when we are shuffling or caching a large amount of data. To know more about this serializer, refer Kryo documentation

There are some good tips in here.

Comments closed

A Compendium Of Bad (Or Misleading) Performance Tips

Published 2018-11-13 by Kevin Feasel

Grant Fritchey responds to a long list of performance tips of greater or (mostly) lesser value:

Index the predicates in JOIN, WHERE, ORDER BY and GROUP BY clauses

What about the HAVING clause? Does the column order matter? Should we put a single column or multi-column index? INCLUDE statements? What kind of index, clustered, non-clustered, columnstore, XML, spatial? This piece of the advice is benign but so non-specific it’s almost useless. Let me summarize: Indexes can be good.

Do not use sp_* naming convention

So, this one is true because it will add a VERY small amount of overhead as SQL Server searches the master database first for your object. However, for most of us, most of the time, this is so far down the list of worries about our database as to effectively vanish from sight.

There’s a pretty long list of things here, most of which Grant considers either incomplete, irrelevant, or sometimes flat-out wrong.

Comments closed

Faster Scalar Functions In SQL Server 2019

Published 2018-11-09 by Kevin Feasel

Brent Ozar looks at improvements the SQL Server team has made to scalar functions in 2019:

My database has to be in 2019 compat mode to enable Froid, the function-inlining magic. Run the same query again, and the metrics are wildly different:

Runtime: 4 seconds
CPU time: 4 seconds
Logical reads: 3,247,991 (which still sounds bad, but bear with me)

My bias tells me that I still want to avoid scalar functions, but it’s no longer the automatic deal-killer it once was.

Comments closed

The Table Spool Operator In SQL Server

Published 2018-11-09 by Kevin Feasel

Hugo Kornelis digs into table spools:

The Table Spool operator is one of the four spool operators that SQL Server supports. It retains a copy of all data it reads in a worktable (in tempdb) and can then later return extra copies of these rows without having to call its child operators to produce them again. These copies can be made available in the same part of the execution plans, or in another part.

Table Spool is probably the most basic of the spool operators. The Index Spool operator is very similar to it, but indexes its data to allow it to return only a subset of the stored rows. The Row Count Spool operator is optimized for specific cases where the rows to be returned are empty. And the Window Spool operator is used to support the ROWS and RANGE specifications of windowing functions.

Typical use cases of a Table Spool are: to reproduce the same input multiple times without having to re-execute its child nodes (e.g. in the inner input of a Nested Loops); to make the same input available in multiple branches of an execution plan (e.g. in wide update plans); or to ensure that an original copy of the data is available after an insert, update, or delete operator changes the base data (“Halloween protection”).

Click through for a great deal more detail.

Comments closed

Deciding Whether To Clean Up Temp Tables

Published 2018-10-30 by Kevin Feasel

Grant Fritchey looks at what difference explicitly dropping temporary tables in a procedure makes:

I then set up Extended Events to capture the query metrics and I executed each of the queries multiple times (also, just for the test, I discarded the results because I didn’t want that process mucking with my measurements). After executing both procedures 500 times, the results were quite simple. The average execution time with an explicit drop was 8,672 microseconds. Meanwhile, the average for not dropping the temporary table was 8,530 microseconds. That’s about a 1% difference across hundreds of executions. Reads were identical and so were writes.

In short, the behavior is the same.

What about the impact on the system? Could I see changes in memory or I/O as these different processes ran?

Grant didn’t notice any difference but check Allen White and Jay Robinson’s answers in the comments. Temp table reuse can happen (if you follow the rules) and can make a difference when a procedure is called frequently enough.

Comments closed