Query Tuning – Page 27

Improving Join Performance with Skewed Datasets in Spark

Published 2022-06-22 by Kevin Feasel

Ajay Gupta gets into the topic of join performance:

Performing Joins on Skewed Datasets: A Dataset is considered to be skewed for a Join operation when the distribution of join keys across the records in the dataset is skewed towards a small subset of keys. For example when 80% of records in the datasets contribute to only 20% of Join keys.
Implications of Skewed Datasets for Join: Skewed Datasets, if not handled appropriately, can lead to stragglers in the Join stage (Read this linked story to know more about Stragglers). This brings down the overall execution efficiency of the Spark job. Also, skewed datasets can cause memory overruns on certain executors leading to the failure of the Spark job. Therefore, it is important to identify and address Join-based stages where large skewed datasets are involved.

Read on for five techniques which may help you out.

Comments closed

Lack of Fun with Scalar Functions

Published 2022-06-22 by Kevin Feasel

Tom Zika takes away the scalars:

I’m still surprised many people don’t realise how lousy Scalar functions are. So because it’s my current focus in work and this Stack Overflow question, I’ll be revisiting this topic.
The focus of part one is parallelism. Unfortunately, parallelism often gets a bad rep because of the prominent wait stats. Also, if there is a skew, it can run slow. But for the most part, it’s advantageous.
Whether or not you want parallelism should be an informed choice. But Scalar functions will enforce the query to run serially, even if you are unaware. That’s why I want to shine a light on this.

Read on for a demo of how even a no-op scalar function can affect query performance. Given the mess we normally see in scalar functions, it’s all downhill from there.

Comments closed

Why that Plan Didn’t Go Parallel

Published 2022-06-20 by Kevin Feasel

Erik Darling looks at another update in SQL Server 2022:

The thing is, the reason always seemed to be “Could Not Generate Valid Parallel Plan” for most of them, even though more explicit reasons were available.
They started cropping up, as things do, in Azure SQL DB, and have finally made it to the box product that we all know and mostly love.
Let’s explore some of them! Because that’s what we do.

Also check out Rob Volk’s comment, as he lists out all the ones he could find, noting that most of these do exist in SQL Server 2019.

Comments closed

Query Store Hints in SQL Server 2022

Published 2022-06-17 by Kevin Feasel

Erik Darling has thoughts:

When you’re dealing with untouchable vendor code full of mistakes, ORM queries that God has turned away from, and other queries that for some reason can’t be tinkered with, we used to not have a lot of options.
In SQL Server 2022, Query Store gains a new super power: you can add hints to queries without intercepting the code in some other manner.

There are a couple of useful hints which won’t be available but Erik seems mostly upbeat about what is there.

Comments closed

Continuing a Dive into Simple Parameterization

Published 2022-06-10 by Kevin Feasel

Paul White shows how not-simple simple parameterization really is:

The output of parsing is a logical representation of the statement called a parse tree. This tree does not contain SQL language elements. It’s an abstraction of the logical elements of the original query specification.
It’s frequently possible to write the same logical requirement in different ways. Using a SQL analogy, x IN (4, 8) is logically the same as writing x = 4 OR x = 8. This flexibility can be useful when writing queries, but it makes implementing an optimizer more difficult.
In general terms, normalization is an attempt to standardize. It recognises common variations expressing the same logic, and rewrites them in a standard way. For example, normalization is responsible for turning x BETWEEN y AND z into x >= y AND x <= z. The normalized, or standardized, form chosen is one the query processor finds convenient to work with internally.

This has been a very interesting series and Paul does promise one more article.

Comments closed

Understanding Missing Index Impact

Published 2022-06-03 by Kevin Feasel

Erik Darling delves into the depths of missing indexes:

Breaking each of those down, the only one that has a concrete meaning is Uses, but that of course doesn’t mean that a query took a long time or is even terribly inefficient.
That leaves us with Average Query Cost, which is the sum of each operator’s estimated cost in the query plan, and Impact.
But where does Impact come from?

Read on to learn where, as well as why you shouldn’t blindly trust that number.

Comments closed

Scalar UDFs Execute Once Per Row

Published 2022-06-02 by Kevin Feasel

Erik Darling proves that which we take for granted:

For the select list, T-SQL scalar UDFs will execute once per row projected by the query, e.g. the final resulting row count, under… Every circumstance I’ve ever seen.
In SQL server. Of course.

Read on for situations like having the function execute more than once per query, using functions for predicate evaluation, etc.

Comments closed

Fun with Nested Loops

Published 2022-05-31 by Kevin Feasel

Jared Poche explains my favorite type of join:

Nested loops joins are the join operator you are likely to see the most often. It tends to operate best on smaller data sets, especially when the first of the two tables being joined has a small data set.
In row mode, the first table returns rows one at a time to the join operator. The join operator then performs a seek\scan against the second table for each row passed in from the first table. It searches that table based on the data provided by the first table, and the columns defined in our ON or WHERE clauses.

Read on for more information about nested loop joins.

Comments closed

IF Branches and Dynamic SQL

Published 2022-05-30 by Kevin Feasel

Erik Darling takes us through the scenic route:

I’m going to use the example from yesterday’s post to show you what you can do to further optimize queries like this.
To make the code fit in the post a little better, I’m going to skip the IF branch for the Posts table and go straight to Votes. Using dynamic SQL here will get you the same behavior at stored procedures, though.

Read on for more detail and a wrap-up of Erik’s series on conditional branching logic and performance tuning.

Comments closed

IF Branching, Local Variables, and Stored Procedures

Published 2022-05-27 by Kevin Feasel

Erik Darling continues a quest. Part 3 involves local variables:

What never seems to get a bad name, despite numerical supremacy in producing terrible results, are local variables.
In this particular scenario, I see developers use them to try to beat “parameter sniffing” to no avail.
A chorus of “it seemed to work at the time”, “I think it made things a little better”, “it worked on my machine”, and all that will ensue.
But we know the truth.

The next part is around stored procedures:

You know and I know and everyone knows that stored procedures are wonderful things that let you tune queries in magickal ways that stupid ORMs and ad hoc queries don’t really allow for.
Sorry about your incessant need to use lesser ways to manifest queries. They just don’t stack up.
But since we’re going to go high brow together, we need to learn how to make sure we don’t do anything to tarnish the image of our beloved stored procedures.

Erik notes that stored procedures are part of the solution but there’s a bit more that we need.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Category: Query Tuning