Press "Enter" to skip to content

Category: Performance Tuning

Adaptive Query Processing In CTP 2.0

Joe Sack has a couple blog posts on adaptive query processing enhancements in SQL Server 2017 CTP 2.0.  First, Batch Mode Adaptive Joins:

We have seen numerous cases where providing a specific join hint solved query performance issues for our customers.  However, the drawback of adding a hint is that we remove join algorithm decisions from the optimizer for that statement. While fixing a short-term issue, the hard-coded hint may not be the optimal decision as data distributions shift over time.

Another scenario is where we do not know up front what the optimal join should be, for example, with a parameter sensitive query where a low or high number of rows may flow through the plan based on the actual parameter value.

With these scenarios in mind, the Query Processing team introduced the ability to sense a bad join choice in a plan and then dynamically switch to a better join strategy during execution.

That one’s really cool.  Joe also talks about interleaved execution for multi-statement TVFs:

SQL Server has historically used a unidirectional “pipeline” for optimizing and executing queries.  During optimization, the cardinality estimation process is responsible for providing row count estimates for operators in order to derive estimated costs.  The estimated costs help determine which plan gets selected for use in execution.  If cardinality estimates are incorrect, we will still end up using the original plan despite the poor original assumptions.

Interleaved execution changes the unidirectional boundary between the optimization and execution phases for a single-query execution and enables plans to adapt based on the revised estimates. During optimization if we encounter a candidate for interleaved execution, which for this first version will be multi-statement table valued functions (MSTVFs), we will pause optimization, execute the applicable subtree, capture accurate cardinality estimates and then resume optimization for downstream operations.

The goal here is to make Table-Valued Functions viable from a performance perspective.  The team has a long way to go there, but they’re working on it.  Also, Joe gives a shout out to Arun Sirpal, frequent curatee.

Comments closed

Transactional Replication Procedures

Drew Furgiuele offers up warnings when thinking about rolling your own transactional replication stored procedures:

In the above picture, we can see that it did replicate the execute statement, and that it affected 19,972 rows on the replica, and it only took 67ms! Sounds awesome, doesn’t it? Here’s a way to handle large batch updates at your publishers without overwhelming your replication setup. But before you go changing everything, you should probably understand that this has some really, really bad side effects if you’re not careful. Let’s look at three really big ones.

All in all, it’s a fairly risky move but might be worth the performance improvements.

Comments closed

Entity Framework Slow, News At 11

Jovan Popovic shows that Entity Framework is slow and Dapper is fast:

To setup test, you can go to StackExchange/Dapper GitHub an download source code. Tests are created as C# solution (Dapper.sln). When you open this solution you can find Dapper.Tests project. You might need to change two things:

  1. Connection strings are hardcoded in Tests.cs file with values like “Server=(local)\SQL2014;Database=tempdb;User ID=sa;Password=Password12!”. You might need to change this and put your connection info.
  2. Project is compiled using dotnet sdk 1.0.0-preview2-003121, so you might get compilation errors if you don’t have a matching framework. I have removed line: “sdk”: { “version”: “1.0.0-preview2-003121” } from global.json to fix this.

Now you will be able to build project and run tests.

Nothing’s going to be faster than hand-crafted, well-tuned statements from people who know what they’re doing.  Micro-ORMs like Dapper and FSharp.Data.SqlClient will trade a little bit of a speed hit for developer niceties.  Heavier frameworks like Entity Framework and NHibernate add a lot more, but tend to be significantly slower.

Comments closed

Data Type Mismatches

Kendra Little gets into why certain data type mismatches force scans of tables while others can still allow seeks:

Sometimes we get lucky comparing a literal value to a column of a different type.

But this is very complicated, and joining on two columns of different types in the same family without explicitly converting the type of one of the columns resulted in worse performance in Paul White’s tests, when the columns allowed NULLs! (Note: I haven’t rerun those tests on 2016, but I think the general advice below still applies.)

General advice: don’t rely on being lucky. Pay attention to your data types, and compare values of the same data type wherever possible.

That’s great advice.

Comments closed

Dr. Elephant: Where Does My Hadoop Cluster Hurt?

Carl Steinbach looks back at Dr. Elephant one year later:

What we needed to introduce to the job-tuning equation was a series of questions like those asked by a physician making a diagnosis: a step-by-step process that guides the user through the problem-solving process, while also educating them at the same time.

So we created Dr. Elephant, a system that automatically detects under-performing jobs, diagnoses the root cause, and guides the owner of the job through the treatment process. Dr. Elephant makes it easy to identify jobs that are wasting resources, as well as jobs that can achieve better performance without sacrificing efficiency. Perhaps most importantly, Dr. Elephant makes it easy to act on these insights by making job-level performance tuning accessible to users regardless of their previous skill level. In the process, Dr. Elephant has helped to ease the tension that previously existed between user productivity on one side and cluster efficiency on the other.

LinkedIn has made this project open source if you want to check it out in your environment.

Comments closed

Unnecessary, Mandatory Work

Lukas Eder lays out one of the biggest performance drains today:

We’re using 8x as much memory in the database when doing SELECT * rather than SELECT film, rating. That’s not really surprising though, is it? We knew that. Yet we accepted it in many many of our queries where we simply didn’t need all that data. We generated needless, mandatory work for the database, and it does sum up. We’re using 8x too much memory (the number will differ, of course).

Now, all the other steps (disk I/O, wire transfer, client memory consumption) are also affected in the same way, but I’m skipping those.

This article is absolutely worth reading and sharing with developers.

Comments closed

Batch Requests Per Second

Tara Kizer explains the Batch Requests/sec metric:

WHAT IS BATCH REQUESTS/SEC?

Batch Requests/sec is a performance counter that tells us the number of T-SQL command batches received by the server per second. It is in the SQLServer:SQL Statistics performance object for a default instance or MSSQL$InstanceName:SQL Statistics for a named instance.

WHAT COMPRISES A BATCH?

If I execute a stored procedure that has multiple queries and calls to other stored procedures, what will Batch Requests/sec show? Let’s test it to find out.

Click through for the answer.

Comments closed

Performance Tuning A Streaming Application

Mathieu Dumoulin explains how he was able to get 10x performance out of a streaming application built around Kafka, Spark Streaming, and Apache Ignite:

The main issues for these applications were caused by trying to run a development system’s code, tested on AWS instances on a physical, on-premise cluster running on real data. The original developer was never given access to the production cluster or the real data.

Apache Ignite was a huge source of problems, principally because it is such a new project that nobody had any real experience with it and also because it is not a very mature project yet.

I found this article fascinating, particularly because the answer was a lot more than just “throw some more hardware at the problem.”

Comments closed

The Cost Of Compression

Erin Stellato looks at the COMPRESS function:

While this data is interesting, I’m more curious about how compression affects my everyday SELECT queries. I have a set of three stored procedures that each have one SELECT query, so that each index is used. I created these procedures for each table, and then wrote a script to pull values for first and last names to use for testing. Here is the script to create the procedures.

Once we have the stored procedures created, we can run the script below to call them. Kick this off and then wait a couple minutes…

To me, the COMPRESS function is most useful for compressing information you tend not to search through but need to keep the in the database, like HTML markup or long descriptions.

Comments closed