Trivial Plans And Columnstore Indexes

Erik Darling warns us that trivial plans against clustered columnstore indexes could lead to row execution rather than batch execution:

Let’s look at one query with a few variations.

SELECT COUNT(*) AS [Records], SUM(CONVERT(BIGINT, t.Amount)) AS [Total]
FROM dbo.t1 AS t
WHERE t.Id > 0 AND t.Id < 3;

The plan for it is alright. It’s fairly straightforward and the query finishes in about 170ms.

We can see from the graphical execution plan that it’s been Simple Parameterized. SQL Server does this to make plan caching more efficient.

Check out the entire post.

Using WITH With OPENJSON

Jovan Popovic points out the performance difference in using the WITH clause in an OPENJSON query:

 

Here are results of the queries:

SQL Server Execution Times: CPU time = 656 ms, elapsed time = 651 ms.
SQL Server Execution Times: CPU time = 204 ms, elapsed time = 197 ms.

As you can see, WITH clause specify that OPENJSON should immediately return properties from the JSON array without second parsing. Performance of the queries might be increased 3 times if you avoid double parsing.

That’s a pretty big difference when you specify the relevant data model elements.

Using Plan Guides

Kendra Little has a video showing how to use plan guides:

If you need to add, remove, or replace hints from ad-hoc queries where you can’t change the code, plan guides can help. See a demo of removing a query hint from parameterized TSQL run from an application, and get tips on how to make your plan guides work in SQL Server.

The code from the demo is here. Links for more info are below the video. Have fun!

Click through to watch the video, or you can catch the podcast version.

More On Adaptive Joins

Erik Darling has a couple more posts on adaptive joins in SQL Server 2017.  First, he wonders what happens when you add scalar functions to the mix:

See, this isn’t SARGable either (and no, SCHEMABINDING doesn’t change this). When a predicate isn’t SARGable, you take away an index seek as an access choice. You don’t see too many Nested Loops with an index scan on the other end, do you?

No.

So there you go. It’s not the function itself that bops our Adaptive Join on the head, but the lack of SARGability.

Erik also looks at using APPLY and EXISTS to make sure you can use adaptive joins with that dynamic duo:

I do have to point out that Cross Apply used to only be implemented as a Nested Loops Join. I learned that many years ago from one of the best articles written about Cross Apply by Paul White. That changed recently — it’s possible to see it implemented with a Hash Join in at least 2016. I’ve seen it crop up in Cross Apply queries without a TOP operator.

The latter results are a bit surprising.

Getting Execution Plans In Spark

Anubhav Tarar shows how to get an execution plan for a Spark job:

There are three types of logical plans:

  1. Parsed logical plan.
  2. Analyzed logical plan.
  3. Optimized logical plan.

Analyzed logical plans go through a series of rules to resolve. Then, the optimized logical plan is produced. The optimized logical plan normally allows Spark to plug in a set of optimization rules. You can plug in your own rules for the optimized logical plan.

Click through for the details.

Multi-Statement Functions

Erik Darling has started looking at interleaved execution of multi-statement table-valued functions in SQL Server 2017.  First, he gives an intro:

In the first plan, the optimizer chooses the ColumnStore index over the nonclustered index that it chose in compat level 130.

This plan is back to where it was before, and I’m totally cool with that. Avoiding bad choices is just as good as making good choices.

I think. I never took an ethics class, so whatever.

In part deux, Erik compares interleaved multi-statement functions to in-line table-valued functions:

In this case, the inline table valued function wiped the floor with the MSTVF, even with Interleaved Execution.

Obviously there’s overhead dumping that many rows into a table variable prior to performing the join, but hey, if you’re dumping enough rows in a MSTVF to care about enhanced cardinality estimation…

Just like Global Thermonuclear War, I believe the best way to win mutli-statement versus inline TVFs is not to play at all.

Adaptive Join Internals

Dmitry Pilugin digs deep into how adaptive joins work in SQL Server 2017:

We have three types of physical join algorithms in SQL Server: hash, nested loops and merge. Adaptive join allows SQL Server automatically choose an actual physical algorithm on the fly between the first two – hash (HM) and nested loops (NL).

NL has two join strategiesnaive nested loops join (inner loop scans the whole inner table or index) and index nested loops join (index on the join column of the inner table is used to find necessary rows and then those rows are applied to the outer row, also called Nested Loops Apply). Typically, the second one performs very well if you have rather small input on the outer side and indexed rather big input on the inner side.

HM is more universal and uses hash algorithms to match rows, so no indexes are necessary. You may refer to my blog post Hash Join Execution Internals for more details.

Adaptive Join starts execution as a Hash Join. It consumes all the input of the build phase and looks at the adaptive join threshold, if the number of rows is more or equal this threshold it will continue as a hash join. However, if the number of rows is less than this threshold, it will switch to a NL.

If you want to get a better understanding of how adaptive joins works, Dmitry’s post is a great start.

EF Core Merge Statements

Richie Rump looks at SQL that Entity Framework Core generates when inserting a batch of records:

If you’re an experienced SQL tuner, you’ll notice some issues with this statement. First off the query has not one but two table variables. It’s generally better to use temp tables because table variables don’t have good statistics by default. Secondly, the statement uses a MERGE statement. The MERGE statement has had more than it’s fair share of issues. See Aaron’s Bertrand’s post “Use Caution with SQL Server’s MERGE Statement” for more details on those issues.

But that got me wondering, why would the EF team use SQL features that perform so poorly? So I decided to take a closer look at the SQL statement. Just so you know the code that was used to generate the SQL saves three entities (Katana, Kama, and Tessen) to the database in batch. (Julie used a Samurai theme so I just continued with it.)

Yeah…I’m not liking the MERGE statement very much here.

More With Adaptive Joins

Erik Darling continues his adaptive joins exploration with two more posts.  First, how local variables can affect the query plan:

The easiest way to look at this is to compare Adaptive Joins with literal values to the same ones using local variables. The results are a little… complicated.

Here are three queries with three literal values. In my copy of the Super User database (the largest Stack Overflow sub-site), I’ve made copies of all the tables and added Clustered ColumnStore indexes to them. That’s the only way to get Adaptive Joins at this point — Column Store has to be involved somewhere along the line.

The last day of data in this dump is from December 11. When I query the data, I’m looking at the last 11 days of data, the last day of data, and then a day where there isn’t any data.

Then Erik takes on non-SARGable queries:

The queries with non-SARGable predicates on the Users table used Adaptive Joins.

The queries with non-SARGable predicates on the Posts table did not.

Now, there is an Extended Events… er… event to track this, called adaptive_join_skipped, however it didn’t seem to log any information for the queries that didn’t get Adaptive Joins.

Bummer! But, if I had to wager a guess, it would be that this happens because there’s no alternative Index Seek plan for the Posts table with those predicates. Their non-SARGableness takes that choice away from the optimizer, and so Adaptive Joins aren’t a choice. The Users table is going to get scanned either way — that’s the nature of ColumnStore indexes, so it can withstand the misery of non-SARGable predicates in this case and use the Adaptive Join.

Two more good posts in Erik’s series, and both definitely worth reading.

Interleaved Execution

Arun Sirpal looks at how Interleaved Execution affects table cardinality estimates with multi-statement table-valued functions in SQL Server 2017:

Joe states in the article “MSTVFs have a fixed cardinality guess of “100” in SQL Server 2014 and SQL Server 2016, and “1” for earlier versions. Interleaved execution will help workload performance issues that are due to these fixed cardinality estimates associated with multi-statement table valued functions.”

This is exactly what I saw where the below is just a basic screen shot of 1 of many tests that I carried out.

Read the whole thing for more details.

Categories

September 2017
MTWTFSS
« Aug  
 123
45678910
11121314151617
18192021222324
252627282930