Since, in spite of our best efforts since SQL Server 2000, we can’t effectively stop scalar UDFs from being used, wouldn’t it be great to make SQL Server simply handle them better?
SQL Server 2019 introduces a new feature called Scalar UDF Inlining. Instead of keeping the function separate, it is incorporated into the overall plan. This leads to a much better execution plan and, in turn, better runtime performance.
Read the whole thing. Especially the part about how this changes the way one of the DMV counters behaves.
So according to this DMV, there’s only 3 queries with memory grants, while the remaining 5 queries have to wait for space in this semaphore. This is where the wait type comes in. When a query is sitting as a waiter, it will display the wait type RESOURCE_SEMAPHORE.
This is one of the biggest performance-killing waits you can find, and there are several ways to tackle it in SQL Server, as well as adding more hardware.
Last month, I covered a puzzle involving matching each row from one table with the closest match from another table. I got this puzzle from Karen Ly, a Jr. Fixed Income Analyst at RBC. I covered two main relational solutions that combined the APPLY operator with TOP-based subqueries. Solution 1 always had quadratic scaling. Solution 2 did quite well when provided with good supporting indexes, but without those indexes also had quadric scaling. In this article I cover iterative solutions, which despite being generally frowned upon by SQL pros, do provide much better scaling in our case even without optimal indexing.
Itzik has three separate solutions here, including one using the CLR.
The gist of this kit is that it is a database repository as part of the sp_BlitzFirst to collect monitoring alerting and performance metric data. Once you’ve set this up, then you can use a Power BI desktop dashboard as an interface for all that data.
Now this is an awesome way to introduce more DBAs to Power BI and it’s a great way to get more out of your metrics data. The challenge is, it’s a lot of data to be performing complete refreshes on and the natural life of a database like this is growth. The refresh on the static database I was sent by Tracy, once I connected my PBIX to the local db sources, took upwards of an hour to refresh. Keep in mind, I have 16G of memory, 32G of swap and have upped my data load options in Power BI quite high.
Kellyn walks through the things she did to improve performance as a starting point, so check it out and be aware that there’s even more that can be done.
This sort of difference, scalar oriented
C++being so much faster than scalar oriented
R, is often distorted into “
This is just not the case. If we adapt the algorithm to be vectorized we get an
Ralgorithm with performance comparable to the
Not all algorithms can be vectorized, but this one can, and in an incredibly simple way. The original algorithm itself (
xlin_fits_R()) is a bit complicated, but the vectorized version (
xlin_fits_V()) is literally derived from the earlier one by crossing out the indices. That is: in this case we can move from working over very many scalars (slow in
R) to working over a small number of vectors (fast in
This is akin to writing set-based SQL instead of cursor-based SQL: you’re thinking in terms which make it easier for the interpreter (or optimizer, in the case of a database engine) to operate quickly over your inputs. It’s also one of a few reasons why I think learning R makes a lot of sense when you have a SQL background.
We expected the Semi Join to turn into an Anti Semi Join, but the plan now also contains a Nested Loop branch with a Row Count Spool – what’s that about? Turns out the Row Count Spool, along with its index seek, has to do with the NOT IN() and the fact that we’re looking at a nullable column. Remember that…
x NOT IN (y, z, NULL)
… always returns false, because the NULL value could represent essentially anything, including the x. And so it is with the inner table, if there happens to be a NULL value among those rows.
Definitely worth the read. There are ways around this performance hit but design decisions have consequences.
I was really curious about the RETURNS NULL ON NULL INPUT function option so I decided to do some testing. I was very surprised to find out that it’s actually a form of scalar UDF optimization that has been in the product since at least SQL Server 2008 R2.
It turns out that if you know that a scalar UDF will always return a NULL result when a NULL input is provided then the UDF should ALWAYS be created with the RETURNS NULL ON NULL INPUT option, because then SQL Server doesn’t even run the function definition at all for any rows where the input is NULL – short-circuiting it in effect and avoiding the wasted execution of the function body.
The more often you pass in NULL to that function, the better your performance will be relative to the default case.
The performance benchmarks on CDH 6.0 show that enabling Parquet vectorization significantly improves performance for a typical ETL workload. In the test workload (TPC-DS), enabling parquet vectorization gave 26.5% performance improvement on average (geomean value of runtime for all the queries). Vectorization achieves these performance improvements by reducing the number of virtual function calls and leveraging the SIMD instructions on modern processors. A query is vectorized in Hive when certain conditions like supported column data-types and expressions are satisfied. However, if the query cannot be vectorized its execution falls back to a non-vectorized execution. Overall, for workloads which use the Parquet file format on most modern processors, enabling Parquet vectorization can lead to better query performance in CDH 6.0 and beyond.
This is worth looking into, especially if you are on the Cloudera stack.
There are workloads where frequent page splits are a problem. I thought I had a system like this many years ago, so I tested various fill factor settings for the culprit table’s clustered index. While insert performance improved by lowering the fill factor, read performance drastically got worse. Read performance was deemed much more critical than write performance on this system. I abandoned that change and instead recommended a table design change since it made sense for that particular table.
Click through for a demo.
The question is now how to perform a coarse estimation of the equijoin cardinality of the highlighted steps, using the information available.
The original cardinality estimator would have performed a fine-grained step-by-step histogram alignment using linear interpolation, assessed the join contribution of each step (much as we did for the minimum step value before), and summed each step contribution to acquire a full join estimate. While this procedure makes a lot of intuitive sense, practical experience was that this fine-grained approach added computational overhead and could produce results of variable quality.
The original estimator had another way to estimate join cardinality when histogram information was either not available, or heuristically assessed to be inferior. This is known as a frequency-based estimation[…]
It’s an interesting dive into one of the changes in 2014’s CE. The post is a little math-heavy but Paul does a great job keeping it interesting.