I recently published a post detailing the new Scalar UDF Inlining feature in SQL 2019 here. That post introduced the new feature in a way that I used to compare performance to the other function types, continuing the performance evaluation of functions that I had previously posted here and here. In the Scalar UDF Inlining post, I used a function to strip all non-numeral values from a string, and to return the result. This used the FOR XML output option.
In thinking about how scalar functions are commonly used, I’ve decided to revisit this feature with a simpler function. I will still compare it to all the other types of functions to see how Scalar UDF Inlining compares to the others.
Wayne’s results are music to the product team’s ears, I’m sure.
Two different T-SQL methods were used for testing. The FOR XML PATH solution incorporates the reverse and stuff functions along with FOR XML PATH. The String_agg solution is much simpler, but is only available on SQL Server 2017 currently.
The FOR XML PATH method has been documented thoroughly over the years in many articles, so these explanations will not be repeated here. However, two points are worth mentioning. The first one is that the @strSeparator (in this case ‘ or ’) string will reside at the end of the constructed string (in this case @strFilter). Since the last occurrence of @strSeparator must be removed to prevent a syntax error, characters must be dropped from the end of the string. Although it may seem obvious that four characters (the length of @strSeparator) should be removed, actually only three need be removed because the end of the constructed string is “r” not “ “. Therefore, the length argument of the left function specification is the length of @strFilter minus three.
STRING_AGG a lot, primarily because I can remember it and I can explain it to people relatively new to SQL Server. Read on to see how they both compare in terms of performance.
Usually, the added features of the CREATE TABLE syntax in new releases of SQL Server are esoteric, and unless you are dealing with memory-optimized tables or other esoteric stuff, they aren’t of great interest. However, the Inline INDEX for both a table and column index has just crept in quietly with SQL Server 2014 (12.x). This was interesting because the SQL Server team back-fitted it to all tables rather than just in-memory OLTP tables for which it was, at the time, found necessary. The new syntax was introduced which allows you to create certain index types inline with the table definition. These could be at column level, concerning just that column, or at the table level, with indexes containing several columns.
Why interesting? This affects multi-statement table functions, user-defined table types, table-valued parameters as well as table variables. It was considered a game-change for table variables because, for a start, it allowed non-unique indexes or explicit clustered indexes to be declared on columns for the first time because you can create indexes on table variables as part of the table definition. Of more significance were the table-level indexes that allowed you to specify multi-column indexes. Previous releases had allowed multi-column primary or unique constraints, but not explicitly named indexes. You still cannot declare an index after the table is created, which is a shame as there are good reasons for being able to do so after a table is stocked with data. Any sort of large import of data into a table that is over-indexed or prematurely-indexed is doomed to crawl rather than to run. I’ll show this later on in this article.
Click through for an analysis of inline indexes themselves as well as how they fit on table variables—something I tend not to do much.
With no clear indications of what the
UNCOMPRESSfunction does, we can at least pass in some simple values to see what comes back, and see if we can make sense of the output. For the following tests, please keep in mind that “8-bit” refers to the
TEXT(deprecated) datatypes. And, “16-bit” refers to the
Read on as Solomon figures out what it does and how non-useful it is for anybody nowadays.
One thing I see fairly often (and am occasionally guilty of myself) is using COUNT(DISTINCT) and DISTINCT interchangeably to get an idea of the number of unique values in a column.
While they will sometimes give you the same results, depending on the data the two methods may not always be interchangeable.
Click through for examples.
Or if you are Rob Volk (@sql_r on Twitter), and you want to create an annoying database on your best frenemy’s SQL Server that includes brackets in the name, like:
This [database] Is Awesome
You will need to do:
CREATE DATABASE [This [database]] Is Awesome];
I’m not saying you should do that, but I’m also not saying you shouldn’t.
I noted that one of the columns failed to convert VARCHAR to DECIMAL.
The error message is below, and it’s usually fairly easy to sort:
Error converting data type varchar to numeric
Normally, I’d use ISNUMERIC to identify the rows that fail to have a value in that column that could be converted to a number. Then, I could identify the value, and then I could replace or exclude it, as required.
However, on this occasion, using ISNUMERIC failed to identify any columns as being non-numeric.
Click through to see why Jen got this result.
If you don’t read the rest of this setup, I want you to take away one thing.
Always reference your tables with your columns when more than one table is involved in the query!
This post is made primarily with SQL Server in mind, but this behavior is actually ANSI SQL and can be replicated in PostgreSQL, MySQL, and Oracle.
Jon’s example is a case where perfectly valid ANSI SQL logic (which is why you can replicate this across platforms and why it’s not a bug) leads to an unexpected result.
Sometimes you need to find the previous value in a column. Easy enough, the LAG window function makes this a breeze (available since SQL Server 2012). But what if the previous value cannot be null? You can pass a default, but we actually need the previous value that was not null, even if it is a few rows back. This makes it a bit harder. T-SQL guru Itzik Ben-Gan has written about the solution to this problem: The Last non NULL Puzzle. It’s a bit of tricky solution.
Click through for the magic words and if you’re on the SQL Server side, upvote this issue to get that functionality in SQL Server too.
I am telling you personally that I hate the use of DISTINCT.
DISTINCT used by those people, who are not sure about their data set or SELECT statement or JOINS.
Whenever I get any query with DISTINCT, immediately I suggest to remove it.
I agree with this sentiment about 85% of the time. There are cases where I know l am working with data at a finer grain than I need and the counts aren’t important. But just tossing a
DISTINCT on a query to stop it from repeating rows is the wrong approach: figure out why that repetition happens and fix it.