My database has to be in 2019 compat mode to enable Froid, the function-inlining magic. Run the same query again, and the metrics are wildly different:
Runtime: 4 seconds
CPU time: 4 seconds
Logical reads: 3,247,991 (which still sounds bad, but bear with me)
My bias tells me that I still want to avoid scalar functions, but it’s no longer the automatic deal-killer it once was.
The Table Spool operator is one of the four spool operators that SQL Server supports. It retains a copy of all data it reads in a worktable (in tempdb) and can then later return extra copies of these rows without having to call its child operators to produce them again. These copies can be made available in the same part of the execution plans, or in another part.
Table Spool is probably the most basic of the spool operators. The Index Spool operator is very similar to it, but indexes its data to allow it to return only a subset of the stored rows. The Row Count Spool operator is optimized for specific cases where the rows to be returned are empty. And the Window Spool operator is used to support the
RANGEspecifications of windowing functions.
Typical use cases of a Table Spool are: to reproduce the same input multiple times without having to re-execute its child nodes (e.g. in the inner input of a Nested Loops); to make the same input available in multiple branches of an execution plan (e.g. in wide update plans); or to ensure that an original copy of the data is available after an insert, update, or delete operator changes the base data (“Halloween protection”).
Click through for a great deal more detail.
I then set up Extended Events to capture the query metrics and I executed each of the queries multiple times (also, just for the test, I discarded the results because I didn’t want that process mucking with my measurements). After executing both procedures 500 times, the results were quite simple. The average execution time with an explicit drop was 8,672 microseconds. Meanwhile, the average for not dropping the temporary table was 8,530 microseconds. That’s about a 1% difference across hundreds of executions. Reads were identical and so were writes.
In short, the behavior is the same.
What about the impact on the system? Could I see changes in memory or I/O as these different processes ran?
Grant didn’t notice any difference but check Allen White and Jay Robinson’s answers in the comments. Temp table reuse can happen (if you follow the rules) and can make a difference when a procedure is called frequently enough.
Built right into the Extended Events is an event that captures conversions that would affect execution plans, plan_affecting_convert. This event will show both CONVERT and CONVERT_IMPLICIT warnings that you would normally only see within an execution plan. You can capture this event with others. Capturing events together along with causality tracking makes it very easy to track queries that have the issue. Here’s one example of how you might capture implicit conversions along with the query:
This kind of event can make system-level performance tuning much easier.
As mentioned in the previous article, SQL Server 2019 cardinality estimations for a table variable are based on actual table variable row counts. Therefore, in SQL Server 2019, we should expect better estimations and better plans for queries that use table variables.
Which queries will benefit from this improvement? Generally, queries that use table variables with a lot of rows in them, which are not tuned yet. For table variables with a few rows, there will not be significant changes and you should expect the same execution plan and almost same execution parameters.
Queries whose execution was slow due to underestimation in table variables usually implement logical joins by using Nested Loop Join physical operator where a Hash or Merge Join operators would be more appropriate. In addition to this, underestimation of table variables participating in multiple joins could lead to issues with insufficient memory grants, and thus data spilling to tempdb .
Click through for the example. The next post in the series will be a case where it doesn’t work very well.
The main advantages of Batch Mode are:
- Algorithms optimized for the multi-core modern CPUs;
- Better CPU cache utilization and increased memory throughput;
- Reduced number of CPU instructions per processed row.
All these features make Batch Mode much faster than Row Mode (typically an order of magnitude, 10x-100x times faster) for analytical queries with CS indexes. One major condition for Batch Mode is a presence of a CS index. If you don’t have a CS index on a table involved in a query, you won’t get Batch Mode.
However, some analytical queries may benefit from Batch Mode without a CS index, or CS cannot be created due to some limitations.
There are a few tricks that allow you to enable Batch Mode on a Rowstore table for example with a dummy filtered CS index (see this post from Itzik Ben-Gan), but SQL Server 2019 may use Batch Mode on Rowstore without any extra efforts from your side.
Dmitry dives into the debugger and teases out the specific circumstances which can help get a query considered for rowstore. If you want a deep dive into what’s currently available, this is your post.
The Need to Vacuum
As Redshift does not reclaim free space automatically, updates and delete operations can frequently lead to table growth. Equally, it’s important as new entries are added, that the data is maintained in a sorted sequence.
The VACUUM command is used to re-sequence data, and reclaim disk space as a result of DELETE and UPDATE operations. Although it won’t block other processes, it can be a resource-intensive operation, especially for data stored using interleaved sort keys.
It should be run periodically to ensure consistent performance and to reduce disk usage.
Some of this is good Postgres advice; some of it is good MPP advice (and serves well, for example, when dealing with Azure SQL Data Warehouse); the rest is Redshift-specific.
One of the most popular posts on my blog last year was where I pretty much suggested that people not use table variables:
This wasn’t new information when I wrote it, but bad performance due to the use of table variables remained such a common anti-pattern that I thought it was worth stressing again.
So, when I saw the above 2019 feature I thought I’d better investigate and update what I’m telling people.
TL;DR It looks like table variables are no longer a problem.
Read the whole thing. This has the potential of changing long-standing advice going back a decade regarding table variables.
In the SQL Sever 2019 preview, we are further expanding query processing capabilities with several new features under the Intelligent Query Processing (QP) feature family. In this blog post we’ll discuss one of these Intelligent QP features that is now available in public preview, batch mode on rowstore. This feature unlocks the advantages of batch mode execution in cases where there is no columnstore participating in the query.
Batch mode is a different execution mode primarily targeted at analytics queries which are characterized as scanning many rows, and doing significant aggregations, sorts, and group-by operations across these rows. Batch mode has been reserved for queries which involve columnstore indexes until now.
Performing scans and calculations using batches of ~ 900 rows at a time rather than row by row is much more efficient for analytic-type queries. For queries that can take advantage of it, batch mode can easily make queries execute many times faster than the same query against the same data in row mode.
To date, the workaround you could use was to create an empty filtered columnstore index on a rowstore table. This solution is more architecturally pleasing and means one less hack to explain.
All it does is take the value of the Sales Amount measure at the lowest granularities of the Customer, Date and Product dimensions, multiply it by 0.08 to find a tax value, and because [Tax Amount] is a real, non-calculated measure, the result of the calculation aggregates up through the cube. [I know that I don’t have to aggregate the result of this specific calculation but remember that this is a simplified example – in the real case I did have to write the calculation using Scope statements – and anyway the best way of handling a basic multiplication like this would be with a measure expression]
The performance was sub-second for my test query and I was happy, but then I realised that the same tax rate was being used in other calculations and may change in the future, so I thought I would store the value 0.08 in a calculated measure:
Chris walks through several iterations of this before landing on a solution which is both reasonable and fast.