Category: Performance Tuning

Each Apache Parquet file contains a footer where metadata can be stored including information like the minimum and maximum value for each column. Starting in v2.9, Impala populates the min_value and max_value fields for each column when writing Parquet files for all data types and leverages data skipping when those files are read. This approach significantly speeds up selective queries by further eliminating data beyond what static partitioning alone can do. For files written by Hive / Spark, Impala only reads the deprecated min and max fields.

The effectiveness of the Parquet min_value/max_value column statistics for data skipping can be increased by ordering (or clustering¹) data when it is written by reducing the range of values that fall between the minimum and maximum value for any given file. It was for this reason that Impala 2.9 added the SORT BY clause to table DDL which directs Impala to sort data locally during an INSERT before writing the data to files.

Even if your answer is “throw more hardware at it,” there eventually comes a point where you run out of hardware (or budget).

Comments closed

Fragmentation Can Affect Execution Plans

Published 2017-12-20 by Kevin Feasel

Jonathan Kehayias explains how index fragmentation can potentially affect execution plans:

Index fragmentation removal and prevention has long been a part of normal database maintenance operations, not only in SQL Server, but across many platforms. Index fragmentation affects performance for a lot of reasons, and most people talk about the effects of random small blocks of I/O that can happen physically to disk based storage as something to be avoided. The general concern around index fragmentation is that it affects the performance of scans through limiting the size of read-ahead I/Os. It’s based on this limited understanding of the problems that index fragmentation cause that some people have begun circulating the idea that index fragmentation doesn’t matter with Solid State Storage devices (SSDs) and that you can just ignore index fragmentation going forward.

However, that is not the case for a number of reasons. This article will explain and demonstrate one of those reasons: that index fragmentation can adversely impact execution plan choice for queries. This occurs because index fragmentation generally leads to an index having more pages (these extra pages come from page split operations, as described in this post on this site), and so the use of that index is deemed to have a higher cost by SQL Server’s query optimizer.

Let’s look at an example.

Check out the example, but definitely read the comments as there are some good conversations in there.

Comments closed

JDBC Roundtrip Costs

Published 2017-12-18 by Kevin Feasel

Lukas Eder shows the performance cost of doing row-by-row data retrieval:

The obvious difference between the JDBC benchmark and the PL/SQL one is the fact that the JDBC call has to traverse a vast amount of logic, APIs, “barriers” between the JVM and the Oracle kernel before it can actually invoke the really interesting part. This includes:

JVM overhead

JDBC logic

Network overhead

Various “outer” layers inside the Oracle database

Oracle’s API layers to get into the SQL and PL/SQL execution engines

The actual code running in the PL/SQL engine

In Toon’s talk (which again, you should definitely watch), the examples are running SQL code, not PL/SQL code, but the results are the same. The actual logic is relatively cheap inside of the database (as we’ve seen in the PL/SQL only benchmark), but the overhead is significant when calling database logic from outside the database.

Thus: It is very important to minimise that overhead

This particular example focuses on Oracle and JDBC, but it certainly applies to other database platforms and distributed architectures.

Comments closed

Reducing Reads In Queries

Published 2017-12-15 by Kevin Feasel

Bert Wagner has a few tips for improving query performance by reducing the number of reads:

If SQL Server thinks it only is going to read 1 row of data, but instead needs to read way more rows of data, it might choose a poor execution plan which results in more reads.

You might get a suboptimal execution plan like above for a variety of reasons, but here are the most common ones I see:

Parameter sniffing

Use of table variables (via Brent Ozar)

Outdated statistics (via Kimberly L. Tripp)

If you had a query that previously ran fine but doesn’t anymore, you might be able to utilize Query Store to help identify why SQL Server started generating suboptimal plans.

Click through for a few more ideas as well.

Comments closed

Powershell Speed Testing

Published 2017-11-30 by Kevin Feasel

Shane O’Neill shows off a Powershell script which allows you to simplify performance testing:

Apart from catching up on news during my commute I only really use notifications for a certain number of hashtags i.e. #SqlServer, #tsql2sday, #sqlhelp, and #PowerShell.

So during work, every so often a little notification will pop up on the bottom right of my window and I can quickly glance down and decide whether to ignore it or check it out.

That’s what happened with the following tweet:

Click through for Shane’s demo.

Comments closed

Legacy Cardinality Estimation In SQL Server

Published 2017-11-29 by Kevin Feasel

Kellyn Pot’vin-Gorman explains what the Legacy Cardinality Estimation setting does in SQL Server:

Oracle DBAs have used the CARDINALITY hint for some time and it should be understood that this may appear to be similar, but is actually quite different. As hinting in TSQL is a bit different than PL/SQL, we can compare similar queries to assist:

TSQL
SELECT CustomerId, OrderAddedDate 
FROM OrderTable 
WHERE OrderAddedDate >= '2016-05-01';
OPTION (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION'));
go
PL/SQL

Where you might first mistake the CE hint for the following CARDINALITY hint in Oracle:
SELECT /*+ CARDINALITY(ORD,15000) */ ORD.CUSTOMER_ID, ORD.ORDER_DATE 
FROM ORDERS ORD WHERE ORD.ORDER_DATE >= '2016-05-01';
This would be incorrect and the closest hint in Oracle to SQL Server’s legacy CE hint would be the optimizer feature hint:
SELECT /*+ optimizer_features_enable('9.2.0.7') */ ORD.CUSTOMER_ID, ORD.ORDER_DATE FROM ORDERS ORD 
WHERE ORD.ORDER_DATE >= '2016-05-01';
If you’re wondering why I chose a 9i version to force the optimizer to, keep reading and you’ll come to understand.

Read on for the comparative explanation as well as more details on SQL Server’s legacy cardinality estimator hint and database-scoped configuration setting.

Comments closed

Finding The Right Batch Size For Bulk Loads

Published 2017-11-15 by Kevin Feasel

Dan Guzman has some bulk load batch size considerations:

Bulk load has long been the fastest way to mass insert rows into a SQL Server table, providing orders of magnitude better performance compared to traditional INSERTs. SQL Server database engine bulk load capabilities are leveraged by T-SQL BULK INSERT, INSERT…SELECT, and MERGE statements as well as by SQL Server client APIs like ODBC, OLE DB, ADO.NET, and JDBC. SQL Server tools like BCP and components like SSIS leverage these client APIs to optimize insert performance.

SQL Server 2016 and later improves performance further by turning on bulk load context and minimal logging by default when bulk loading into SIMPLE and BULK LOGGED recovery model databases, which previously required turning on trace flags as detailed in this blog post by Parikshit Savjani of the MSSQL Tiger team. That post also includes links to other great resources that thoroughly cover minimal logging and data loading performance, which I recommend you peruse if you use bulk load often. I won’t repeat all that information here but do want to call attention to the fact that these new bulk load optimizations can result in much more unused space when a small batch size is used compared to SQL Server 2014 and older versions.

Click through for some tips.

Comments closed

Actual Execution Plan Enhancements

Published 2017-11-13 by Kevin Feasel

Pedro Lopes points out some additional data available in the properties section when you generate an actual execution plan:

Looking at the actual execution plan is one of the most used performance troubleshooting techniques. Having information on elapsed CPU time and overall execution time, together with session wait information in an actual execution plan allows a DBA to use showplan to troubleshoot issues away from the server, and be able to correlate and compare different types of waits that result from query or schema changes.

A few months ago we had introduced exposed in SSMS some of the per-operator statistics, such as CPU and elapsed time per thread. More recently, we have introduced overall query CPU and elapsed time tracking for statistics showplan xml (both in ms). These can be found in the root node of an actual plan. Available using the latest versions of SSMS v17, when used with SQL Server 2012 SP4, SQL Server 2016 SP1 and SQL Server 2017. For SQL Server 2014 it will become available in a future Service Pack.

Also be sure to check out Geoff Patterson’s Connect item asking that the execution plan results show the top ten waits in descending order rather than ascending order. That’s the appropriate ordering in my mind: show me the most important things first.

Comments closed

Soft-NUMA Doesn’t Limit MAXDOP

Published 2017-11-07 by Kevin Feasel

Lonny Niederstadt tests whether soft-NUMA forces MAXDOP = 1:

I mentioned that I was planning to set up a soft-NUMA node for each vcpu on a 16 vcpu VM, to evenly distribute incoming connections and thus DOP 1 queries over vcpus. Thomas Kejser et al used this strategy to good effect in “The Data Loading Performance Guide”, which used SQL Server 2008 as a base.
https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx

My conversation partner cautioned me that leaving this soft-NUMA configuration in place after the specialized workload would result in DOP 1 queries whether I wanted them or not. The claim was, effectively, a parallel query plan generated by a connection within a soft-NUMA node would have its MAXDOP restricted by the scheduler count (if lower than other MAXDOP contributing factors). Though I wasn’t able to test at the time, I was skeptical: I’d always thought that soft-NUMA was consequential to connection placement, but not to MAXDOP nor to where parallel query workers would be assigned.

I’m back home now… time to test!!

Read on for the test.

Comments closed

CXCONSUMER Waits And More From PASS Summit

Published 2017-11-06 by Kevin Feasel

Brent Ozar relays a couple exciting announcements from PASS Summit:

Microsoft’s Joe Sack & Pedro Lopes held a forward-looking session for performance tuners at the PASS Summit and dropped some awesome bombshells.

Pedro’s Big Deal: there’s a new CXPACKET wait in town: CXCONSUMER. In the past, when queries went parallel, we couldn’t differentiate harmless waits incurred by the consumer thread (coordinator, or teacher from my CXPACKET video) from painful waits incurred by the producers. Starting with SQL Server 2016 SP2 and 2017 CU3, we’ll have a new CXCONSUMER wait type to track the harmless ones. That means CXPACKET will really finally mean something.

Read on to see what Joe has for us.

Comments closed