Category: Performance Tuning

Foreign Keys and Non-Changing Updates

Published 2020-04-30 by Kevin Feasel

If you update a row without actually changing its contents, does it still hurt?
Paul White wrote in detail about the impact of non-updating updates, proving that SQL Server works hard to avoid doing extra work where it can. That’s a great post, and you should read it.
But foreign keys add another level of complexity. If you have foreign keys, and you tell SQL Server that you’re updating the contents of those related columns, SQL Server will check ’em even when the data isn’t changing.

Click through for the demonstration. I don’t think I agree with Brent’s dichotomy as laid out at the end of the post—the back-and-forth about removing keys would only make sense if you’re on the edge of the database equivalent of the production possibility frontier and expecting to move well beyond that point very soon. I’m not sure how well that describes the average company, but it’s a side quibble.

Comments closed

Aggregations in Power BI Desktop

Published 2020-04-27 by Kevin Feasel

Jeroen ter Heerdt tries out aggregations in Power BI Desktop:

Aggregations bring me back to the good old SSAS Multidimensional days. The days that I invariably built the aggregations tree the wrong way around, SSAS would complain, I would scream and eventually give in.
You can imagine that I was curious but skeptical when I tried aggregations in Power BI Desktop. I was afraid of ending up in the same hate-but-need relationship that I had with SSAS multidimensional when it came to aggregations.
The good news? It is not like that. At all. Once you have aggregations working, they are great.
The not so good news? It took me longer than I am willing to admit getting them working – primarily due to data types and creation of the aggregated table. More details in this blog post.

Read on for the full story.

Comments closed

Tempdb Issues You Might Have

Published 2020-04-22 by Kevin Feasel

Fabiano Amorim walks through a few of the tricky issues around tempdb:

One of many performance improvements that came with SQL Server 2014 is that it doesn’t flush dirty pages created in a minimally logged operation on tempdb. This gives you the benefit of having faster (compared to prior versions) inserts, but it caused another problem as those allocated pages may take a lot of time to be removed from the buffer pool data cache. Before discussing the problem, quickly look at the benefit and then understand some important concepts of flush dirty pages on tempdb.

Click through to learn more about the fix, and then a fix to the fix…which opened up a new avenue to fix. This kind of thing is why operating at scale is so difficult: the solution to one problem often becomes the avenue to a new problem.

Comments closed

Serialization in Apache Flink

Published 2020-04-15 by Kevin Feasel

Nico Kruber walks us through the viable set of serializers in Apache Flink:

Flink handles data types and serialization with its own type descriptors, generic type extraction, and type serialization framework. We recommend reading through the documentation first in order to be able to follow the arguments we present below. In essence, Flink tries to infer information about your job’s data types for wire and state serialization, and to be able to use grouping, joining, and aggregation operations by referring to individual field names, e.g. stream.keyBy(“ruleId”) or dataSet.join(another).where("name").equalTo("personName"). It also allows optimizations in the serialization format as well as reducing unnecessary de/serializations (mainly in certain Batch operations as well as in the SQL/Table APIs).

Click through for notes on each serializer and a graph which shows how the choice of a serializer can make a huge difference.

Comments closed

Execution Plan Training, in Video Form

Published 2020-04-03 by Kevin Feasel

Hugo Kornelis makes an announcement:

As those who have been to my full-day precon on execution plans know, I believe that learning to understand execution plans does not start with dozens of examples. It starts with an explanation of the basics, followed by an overview of operators. Just like learning Russian doesn’t start with reading Tolstoy’s Война и мир (War and Peace), but with learning the grammar rules and the vocabulary.
Once you know the grammar of a language, and enough of its vocabulary, you can then pick up any book. And the more you do that, the easier it becomes. Eventually, one day, you will be able to read Война и мир in its original language.
And once you know the basics of reading execution plans, and are familiar with most of the operators, you will be able to tackle any execution plan you find on your servers, no matter how complex.

And, at least for now, this is free. So check out what Hugo has already and pass along a “thank you” if you like what you see there.

Comments closed

Parameterization and Enumerable.Contains() with EF Core 3

Published 2020-03-31 by Kevin Feasel

Erik Ejlskov Jensen explains how we can prevent Entity Framework Core 3 from polluting the plan cache if we use Enumerable.Contains():

One of the many advantages of using a tool like Entity Framework Core is, that you are sure that the framework will generate properly parameterized SQL for you. This helps avoid SQL injection issues and avoids plan cache pollution. Unfortunately, EF Core currently falls short on that promise, when translating queries, where you supply a list of values to be matched against a column – Enumerable.Contains method – this is translated to a SQL Server IN operator

Click through for a setup of the problem as well as the solution.

Comments closed

Visualizing Power BI Query Parallelism

Published 2020-03-17 by Kevin Feasel

Chris Webb gives us a way to visualize Power BI Premium and Azure Analysis Services query parallelism:

Why is this important? When you’re tuning the performance of a Power BI report the first thing to do is to look at the performance of the individual DAX queries run and make them run as fast as possible. However when a Power BI report is rendered any one query is likely to be run at the same time as several other queries run for the same report, and this will have an impact on its performance. How much of an impact there is will depend on how many queries need to be run and the number of back-end v-cores available on your Premium capacity, or the number of QPUs available on your Azure Analysis Services instance if you’re using a Live connection to AAS. The more v-cores/QPUs you have available, the more of the work needed for a query that can be run in parallel; you can see a table listing the number of v-cores for each Premium SKU here, and the number of QPUs for each Azure Analysis Services SKU here. As a result of this if you have reports with a large number of visuals that generate slow DAX queries, scaling up your Power BI Premium capacity or AAS instance may improve overall report performance. Reducing the number of visuals on your report and/or reducing the number of visuals needed to display the same information will also reduce the number of queries that need to be run and therefore improve overall performance.

Read on for the demo.

Comments closed

Tuning EMR Performance with Dr. Elephant and Sparklens

Published 2020-03-16 by Kevin Feasel

Nivas Shankar and Mert Hocanin show us how to use a couple of products to tune Hive and Spark jobs:

Data engineers and ETL developers often spend a significant amount of time running and tuning Apache Spark jobs with different parameters to evaluate performance, which can be challenging and time-consuming. Dr. Elephant and Sparklens help you tune your Spark and Hive applications by monitoring your workloads and providing suggested changes to optimize performance parameters, like required Executor nodes, Core nodes, Driver Memory and Hive (Tez or MapReduce) jobs on Mapper, Reducer, Memory, Data Skew configurations. Dr. Elephant gathers job metrics, runs analysis on them, and presents optimization recommendations in a simple way for easy consumption and corrective actions. Similarly, Sparklens makes it easy to understand the scalability limits of Spark applications and compute resources, and runs efficiently with well-defined methods instead of leaning by trial and error, which saves both developer and compute time.
This post demonstrates how to install Dr. Elephant and Sparklens on an Amazon EMR cluster and run workloads to demonstrate these tools’ capabilities. Amazon EMR is a managed Hadoop service offered by AWS to easily and cost-effectively run Hadoop and other open-source frameworks on AWS.

Even if you aren’t using ElasticMapReduce, Dr. Elephant and Sparklens are quite useful products.

Comments closed

Finding Query Store’s Server Impact

Published 2020-03-11 by Kevin Feasel

Tracy Boggiano shows us how to track the performance impact of Query Store on an environment:

This month’s T-SQL Tuesday blogging party is brought to you well by me and I wanted to talk more about Query Store. I did write a book on it but there if still more to know about that is not in the book. I am sure the rest of everyone’s posts will prove enlightening and provide valuable content for folks using or looking to implement Query Store. Someone should have told Grant to hold off a week on his post about DROP / CREATE of procedures and what happens with plan forcing so it could officially be part of the party.
I frequently get asked while presenting about the impact of running Query Store on the instance and one thing that was not in the book was the performance counters that were added to help track just that.

You should probably buy a bunch of copies of Tracy’s book. Just in case.

Comments closed

Managing Performance on Azure SQL Managed Instances

Published 2020-02-26 by Kevin Feasel

Tim Radney has a few considerations for you if you want to start using Azure SQL Managed Instances:

Storage is a bit more difficult to plan and make considerations for, due to having to consider multiple factors. For storage you need to account for the overall storage requirement for both storage size, and I/O needs. How many GBs or TBs are needed for the SQL Server instance and how fast does the storage need to be? How many IOPS and how much throughput is the on-premises instance using? For that, you must baseline your current workload using perfmon to capture average and max MB/s and/or taking snapshots of sys.dm_io_virtual_file_stats to capture throughput utilization. This will give you an idea of what type of I/O and throughput you need in the new environment. Several customers I’ve worked with have missed this vital part of migration planning and have encountered performance issues due to selecting an instance level that didn’t support their workload.

Tim has a lot of good advice in here, so read the whole thing.

Comments closed