Press "Enter" to skip to content

Category: Query Tuning

Data Types Matter, Even in the Serverless SQL Pool

Jovan Popovic has a public service announcement for us:

The serverless SQL pool is a distributed computing system that executes concurrent queries on a set of distributed compute nodes. Multiple compute nodes are running the parts of a distributed query plan that read the underlying files, join the data sets, group, and aggregate results. Different queries might try to use the same compute nodes to execute the parts of the queries.

The oversized column types like VARCHAR(MAX) might trick the compute node to allocate more resources than is needed. However, the allocation is based on the estimate, but these over-allocated resources will not be used in actual execution because they are not needed. If a compute node needs 100MB to sort the results it will use these 100MB although the query optimizer allocated 4GB of memory for the task on the compute node.

Read the whole thing.

Comments closed

LOBs and OPTION(RECOMPILE)

Paul White has a warning for us:

All that is fairly well-known. The point of this short post is to draw your attention to another side-effect of adding OPTION (RECOMPILE) — the parameter embedding optimization (PEO).

When PEO is used, SQL Server takes the value of any variables and parameters and embeds the runtime values in the query text, pretty much as if you had entered them by hand before compiling. This is often very useful for plan quality, but there is a potential drawback when large object types (LOBs) are in play.

Click through for the explanation and a simple demo.

Comments closed

When MAX becomes TOP

Forrest McDaniel tries out a few aggregations:

I can’t tell you how tempted I was to call this MAXTOP. Thinking about it still causes stray giggles to bubble up, as I gleefully envision sharing this info with eager young DBAs at a SQL Saturday, then ushering them off to a session on parallelism. Thankfully I’m not that evil.

The real conclusion is that SQL Server is programmed to be very clever. Despite the cleverness though, details matter, and the cleverness often falls short. What’s that warm glow I feel inside? Ah yes, job security.

Definitely worth a read, and Forrest also shares the repro scripts.

Comments closed

Pokey Performance with EXISTS

Erik Darling reminds us that even good things can go bad:

Look, I really like EXISTS and NOT EXISTS. I do. They solve a lot of problems.

This post isn’t a criticism of them at all, nor do I want you to stop using them. I would encourage you to use them more, probably.

But there’s some stuff you need to be aware of when you use them, whether it’s in control-flow logic, or in queries.

But do read on to see a specific type of issue you can run into with a left semi join.

Comments closed

Troubleshooting with sp_HumanEvents

Erik Darling shows off sp_HumanEvents:

With shorter procs you can probably just collect actual execution plans and slam F5 like a tall glass of gin at 6am.

But you don’t wanna do that with the larger procs, for a few practical reasons:

– Lots of little queries run quickly, and we don’t care about those

– Navigating through lots of plans in SSMS is tedious

– There’s no differentiation when other procedures, etc. are invoked

– You introduce a lot of overhead retrieving and rendering all those plans

– The full query text might not be captured, which is a limitation in many places

Let’s save the day with sp_HumanEvents, my stored procedure that makes using Extended Events really easy.

Read on to see how this all works.

Comments closed

Calculating the Adaptive Join Threshold

Paul White breaks out the math books:

One thing I want you to bear in mind throughout this piece is an adaptive join always starts executing as a batch mode hash join. This is true even if the execution plan indicates the adaptive join expects to run as a row mode apply.

Like any hash join, an adaptive join reads all rows available on its build input and copies the required data into a hash table. The batch mode flavour of hash join stores these rows in an optimized format, and partitions them using one or more hash functions. Once the build input has been consumed, the hash table is fully populated and partitioned, ready for the hash join to start checking probe-side rows for matches.

This is the point where an adaptive join makes the decision to proceed with the batch mode hash join or to transition to a row mode apply. If the number of rows in the hash table is less than the threshold value, the join switches to an apply; otherwise, the join continues as a hash join by starting to read rows from the probe input.

Read the whole thing and learn more about the cutoffs for adaptive joins.

Comments closed

Restrictions for Parallel Insertion

Erik Darling summarizes the main man:

I’d like to start this post off by thanking my co-blogger Joe Obbish for being lazy and not blogging about this when he first ran into it three years ago.

Now that we’re through with pleasantries, let’s talk turkey.

Over in this post, by Arvind Shyamsundar, which I’m sure Microsoft doesn’t consider official documentation since it lacks a GUID in the URL, there’s a list of… things about parallel inserts.

Read on for the summary of limitations.

Comments closed

More Efficient Pivoting

Dave Mason is on the hunt:

While working with some poorly performing code in T-SQL that used a PIVOT operator, I wondered if there was a more efficient way to get a result set of pivoted data. It may have been a fool’s errand, but I still wanted to try. It dawned on me that I could use the STRING_AGG() function to build a delimited list of pivoted column names and values. From there, I’d have to “shred” the delimited data to rows and columns. But how?

Read on to see how.

Comments closed

Comparing OPENJSON vs Other List-Parsing Methods in SQL Server

Aaron Bertrand splits a string:

I have long advocated avoiding splitting strings by using table-valued parameters (TVPs), but this is not always a valid option; for example, PHP drivers do not handle this feature yet. A new pattern I’ve seen emerging is to replace splitting comma-separated strings with the new OPENJSON functionality introduced in SQL Server 2016. I wanted to explore why this is not an improvement in the typical case, unless you are using a client application platform that doesn’t support TVPs and your application data already happens to be in JSON format.

Read on for the solution, which has a somewhat surprising outcome.

Comments closed