Category: Performance Tuning

Execution Plan Training, in Video Form

Published 2020-04-03 by Kevin Feasel

As those who have been to my full-day precon on execution plans know, I believe that learning to understand execution plans does not start with dozens of examples. It starts with an explanation of the basics, followed by an overview of operators. Just like learning Russian doesn’t start with reading Tolstoy’s Война и мир (War and Peace), but with learning the grammar rules and the vocabulary.
Once you know the grammar of a language, and enough of its vocabulary, you can then pick up any book. And the more you do that, the easier it becomes. Eventually, one day, you will be able to read Война и мир in its original language.
And once you know the basics of reading execution plans, and are familiar with most of the operators, you will be able to tackle any execution plan you find on your servers, no matter how complex.

And, at least for now, this is free. So check out what Hugo has already and pass along a “thank you” if you like what you see there.

Comments closed

Parameterization and Enumerable.Contains() with EF Core 3

Published 2020-03-31 by Kevin Feasel

Erik Ejlskov Jensen explains how we can prevent Entity Framework Core 3 from polluting the plan cache if we use Enumerable.Contains():

One of the many advantages of using a tool like Entity Framework Core is, that you are sure that the framework will generate properly parameterized SQL for you. This helps avoid SQL injection issues and avoids plan cache pollution. Unfortunately, EF Core currently falls short on that promise, when translating queries, where you supply a list of values to be matched against a column – Enumerable.Contains method – this is translated to a SQL Server IN operator

Click through for a setup of the problem as well as the solution.

Comments closed

Visualizing Power BI Query Parallelism

Published 2020-03-17 by Kevin Feasel

Chris Webb gives us a way to visualize Power BI Premium and Azure Analysis Services query parallelism:

Why is this important? When you’re tuning the performance of a Power BI report the first thing to do is to look at the performance of the individual DAX queries run and make them run as fast as possible. However when a Power BI report is rendered any one query is likely to be run at the same time as several other queries run for the same report, and this will have an impact on its performance. How much of an impact there is will depend on how many queries need to be run and the number of back-end v-cores available on your Premium capacity, or the number of QPUs available on your Azure Analysis Services instance if you’re using a Live connection to AAS. The more v-cores/QPUs you have available, the more of the work needed for a query that can be run in parallel; you can see a table listing the number of v-cores for each Premium SKU here, and the number of QPUs for each Azure Analysis Services SKU here. As a result of this if you have reports with a large number of visuals that generate slow DAX queries, scaling up your Power BI Premium capacity or AAS instance may improve overall report performance. Reducing the number of visuals on your report and/or reducing the number of visuals needed to display the same information will also reduce the number of queries that need to be run and therefore improve overall performance.

Read on for the demo.

Comments closed

Tuning EMR Performance with Dr. Elephant and Sparklens

Published 2020-03-16 by Kevin Feasel

Nivas Shankar and Mert Hocanin show us how to use a couple of products to tune Hive and Spark jobs:

Data engineers and ETL developers often spend a significant amount of time running and tuning Apache Spark jobs with different parameters to evaluate performance, which can be challenging and time-consuming. Dr. Elephant and Sparklens help you tune your Spark and Hive applications by monitoring your workloads and providing suggested changes to optimize performance parameters, like required Executor nodes, Core nodes, Driver Memory and Hive (Tez or MapReduce) jobs on Mapper, Reducer, Memory, Data Skew configurations. Dr. Elephant gathers job metrics, runs analysis on them, and presents optimization recommendations in a simple way for easy consumption and corrective actions. Similarly, Sparklens makes it easy to understand the scalability limits of Spark applications and compute resources, and runs efficiently with well-defined methods instead of leaning by trial and error, which saves both developer and compute time.
This post demonstrates how to install Dr. Elephant and Sparklens on an Amazon EMR cluster and run workloads to demonstrate these tools’ capabilities. Amazon EMR is a managed Hadoop service offered by AWS to easily and cost-effectively run Hadoop and other open-source frameworks on AWS.

Even if you aren’t using ElasticMapReduce, Dr. Elephant and Sparklens are quite useful products.

Comments closed

Finding Query Store’s Server Impact

Published 2020-03-11 by Kevin Feasel

Tracy Boggiano shows us how to track the performance impact of Query Store on an environment:

This month’s T-SQL Tuesday blogging party is brought to you well by me and I wanted to talk more about Query Store. I did write a book on it but there if still more to know about that is not in the book. I am sure the rest of everyone’s posts will prove enlightening and provide valuable content for folks using or looking to implement Query Store. Someone should have told Grant to hold off a week on his post about DROP / CREATE of procedures and what happens with plan forcing so it could officially be part of the party.
I frequently get asked while presenting about the impact of running Query Store on the instance and one thing that was not in the book was the performance counters that were added to help track just that.

You should probably buy a bunch of copies of Tracy’s book. Just in case.

Comments closed

Managing Performance on Azure SQL Managed Instances

Published 2020-02-26 by Kevin Feasel

Tim Radney has a few considerations for you if you want to start using Azure SQL Managed Instances:

Storage is a bit more difficult to plan and make considerations for, due to having to consider multiple factors. For storage you need to account for the overall storage requirement for both storage size, and I/O needs. How many GBs or TBs are needed for the SQL Server instance and how fast does the storage need to be? How many IOPS and how much throughput is the on-premises instance using? For that, you must baseline your current workload using perfmon to capture average and max MB/s and/or taking snapshots of sys.dm_io_virtual_file_stats to capture throughput utilization. This will give you an idea of what type of I/O and throughput you need in the new environment. Several customers I’ve worked with have missed this vital part of migration planning and have encountered performance issues due to selecting an instance level that didn’t support their workload.

Tim has a lot of good advice in here, so read the whole thing.

Comments closed

High-Throughput REST APIs with Dapper and Azure SQL DB

Published 2020-02-24 by Kevin Feasel

Davide Mauri builds out an example of a WebAPI project using Dapper to query Azure SQL Database:

I was able to execute 1100 Requests Per Seconds with a median response time of 20msec. If you can accept a bit higher latency, you can also reach 1500 RPS but the median response time becomes 40msec and the 95 percentile is set at 95msec. Database usage never goes above 20% in such cases…and in fact the bottleneck is the Web App (better, the Web App Plan) and more specifically the CPU. Time to scale up or out the Web App Plan.
By scaling up and out a bit, I was able to reach almost 10.000 request per second with just an HS_Gen5_4. Quite impressive.

I like Dapper as a micro-ORM. Products like it and FSharp.Data.SqlClient are good examples of how you can remove a lot of middleware goop without taking on the performance burdens of Entity Framework and Hibernate.

Comments closed

Sorting When Columns are Ordered the Same Way

Published 2020-02-21 by Kevin Feasel

Erik Darling thinks outside the column:

Sorts aren’t just for the column(s) in your order by — if you SELECT *, you need to all the columns in the * by all the columns in the order by.
I know I basically repeated myself. That’s for emphasis. It’s something professional writers do.
Dig it.

The principle Erik is talking about is having a monotonic relationship (that is, as one column increases, the other always increases; and as one column decreases, the other always decreases). If you can guarantee that, and if one of those happens to be indexed already, you can get a nice performance boost.

But wait, I was told there would be no math.

Comments closed

Two Performance Tricks for Spark SQL

Published 2020-02-20 by Kevin Feasel

Divyansh Jain shares a couple of tips when optimizing Apache Spark code:

1. Avoid UDFs. But why..?
Because internally, Catalyst doesn’t optimize and process UDFs at all, which results in losing the optimization level. Instead, try using SparkSql API to develop your application.

Click through for a demo and for the second tip.

Comments closed

Reducing Visual Count to Improve Performance

Published 2020-02-18 by Kevin Feasel

Chris Webb explains that you might get better performance in Power BI with fewer visuals:

Before we go any further, I don’t want you to go and change your reports if you’re not going to get any benefit from doing so. Use Performance Analyzer (as shown here) to determine which visuals on your report are the cause of slow performance – there’s no point redesigning visuals that are fast anyway.
As a general rule the more visuals you put on a report page the slower it’s going to get. It’s logical if you think about it: the more visuals there are, the more queries have to be run against your dataset and the more work Power BI has to do to render the report. I know there is a tendency to try to pack as much information onto a page as possible and this often happens when someone else has designed the report you’re trying to build, but you should always try to resist this. Splitting a single large page into multiple smaller pages, using slicers or filters to reduce the amount of data shown at any one time and avoiding gigantic Excel-like tables are a good idea.

It certainly doesn’t mean “get rid of all of your visuals;” after all, speed is only one part of the story. Read the whole thing.

Comments closed