In one of my sessions, Set Me Up: How to Think in Sets, I discuss a variety of performance-inhibiting query constructs, including scalar UDFs. I thought it would be interesting to take the simple scalar function that I use in the demo and see what kind of difference that scalar inlining might make.
First, I restored the CorpDB database that I use in the session to my SQL Server 2019 CTP 2.1 instance and initially set the compatibility level to 140. I also ran script 001 from the demo to create the needed database tables (no need to create the CLR objects for this test). I then ran script 030 to execute the scalar UDF test. In a nutshell, this script
creates a UDF
runs a query that calls the UDF about 13,000 times, capturing the time required to do so
repeated this test five times
discards the fastest and slowest tests
reports the average time for the remaining three tests
If I’m reading Brian’s notes right, it’s still slower than writing the set-based solution yourself, but a huge improvement over the prior scalar function performance.
A good habit to get into is to explicitly name your constraints. I try to do this when I create tables to be sure that a) I have a PK and b) it’s named the same for all environments.
I can create a PK inline, with a simple table like this:CREATE TABLE Batting ( BattingKey INT NOT NULL CONSTRAINT BattingPK PRIMARY KEY , PlayerID INT , BattingDate DATETIME , AB TINYINT , H TINYINT , HR tinyint ) ;
This gives a primary key, named “BattingPK, that I can easily see inline with the column.
Steve also gives an alternative formulation which works well for composite keys. You can additionally add constraints after the create statement, but if you are creating temp tables and want to take advantage of temp table reuse, constraints have to be created as part of the table (and cannot have names). For additional fun, since SQL Server 2014, you can create indexes as part of the CREATE TABLE statement as well—that was needed to create memory-optimized tables as back in that edition, you couldn’t add new indexes after the fact.
It looks so clean compared to the first example! but it wasn’t until I tested the second method out that I realised that the behaviour of the two CASE expressions are different as outlined on books online
The CASE expression has two formats:
The simple CASE expression compares an expression to a set of simple expressions to determine the result.
The searched CASE expression evaluates a set of Boolean expressions to determine the result.
Both formats support an optional ELSE argument.
I put together some examples to illustrate the difference when evaluating Null using the two Case expressions, the query returns the column ‘Databasename’ from the derived list values clause, example 1 has a Null value and example 2 has a value of ‘SQLUndercover’ which you will see below:
Adrian looks into a scenario in which the two CASE expressions return different results, and digs into execution plans to find out why.
The only functional difference between them is that the
DATEDIFF_BIG()returns values as a
BIGINT, for results that exceed the boundary of an
INT. Keep this in mind when deciding which one to use. For example, the maximum number of seconds an
INTcan hold is 68 years, while a
BIGINTcan comfortably store the number of seconds in 10,000 years. This becomes especially important when dealing with microseconds and nanoseconds.
The rest of the post will use
DATEDIFF()to refer to both functions.
I think this might be the first time I’d read about
DATEDIFF_BIG()and I’m not aware of ever having used it. But hey, it could make sense if you need to track more than 2 billion microseconds.
I ran into an interesting problem while working with the POWER() function. I was trying to do some binary conversions and had a statement like this to process powers of 2.SELECT POWER(2, n)
This was designed to take a value and return a power of 2. I then used a different value to determine if this was added to my conversion factor or not. In trying to work with some larger numbers, I ran into this error:
Msg 232, Level 16, State 3, Line 3
Arithmetic overflow error for type int, value = 2147483648.000000.
Click through for a description of the problem as well as Steve’s solution.
I started pondering it for a bit and began to wonder if I could use the new for SQL Server 2016 STRING_SPLIT function to convert a binary string to decimal. The thought process was to split the string into rows of CHAR(1) values, along with an in-string character position. Then I could take the “1” values, calculate the 2ᵡ value, and sum them up for a decimal value.
I quickly realized I’d need to reverse the binary string. I also discovered the STRING_SPLIT function eliminated 0’s adjacent to 1’s. I don’t have much use for this code at the moment, but maybe there’s something here that you can take away.
Given some of the issues Dave ran into, it seems that a tally table-based solution might be a better choice.
My database has to be in 2019 compat mode to enable Froid, the function-inlining magic. Run the same query again, and the metrics are wildly different:
Runtime: 4 seconds
CPU time: 4 seconds
Logical reads: 3,247,991 (which still sounds bad, but bear with me)
My bias tells me that I still want to avoid scalar functions, but it’s no longer the automatic deal-killer it once was.
The other day, Erin Stellato asked a question on twitter about the value of nested SPs. Here’s how I weighed in:
I’m not a fan of nested anything. Too much hidden complexity. Code reusability leads to queries that are jack of all trades, master of none.
“Don’t repeat yourself” doesn’t work as well in SQL as it does in other code.
— Michael J Swart (@MJSwart) October 19, 2018
Hidden complexity has given me many problems in the past. SQL Server really really likes things simple and so it’s nice to be able to uncover that complexity. Andy Yun has tackled this problem for nested views with his
Click through for a script which helps.
Table-valued parameters were introduced in SQL Server 2008. Before that, there were limited options to pass tabular data to stored procedures. Most developers used one of the following methods:
Data in multiple columns and rows was represented in the form of a series of parameters. However, the maximum number of parameters that can be passed to a SQL Server stored procedure is 2,100. Therefore, in the case of a large table, this method could not be used. Furthermore preprocessing is required on the server side in order to format the individual parameters into a tabular form.
Create multiple SQL statements that can affect multiple rows, such as UPDATE. The statements can be sent to the server individually or in the batched form. Even if they are sent in the batched form, the statements are executed individually on the server.
Another way is to use delimited strings or XML documents to bundle data from multiple rows and columns and then pass these text values to parameterized SQL statements or stored procedures. The drawback of this approach was that you needed to validate the data structure in order to unbundle the values.
The .NET framework then makes it easy to pass in an IEnumerable as a table-valued parameter.
-- note that this creates a single column of values -- which could be used in something like IN -- for example SELECT 1 WHERE 12 IN ( SELECT tempField FROM (VALUES(11),(12),(7)) tempTable(tempField)) -- I could rephrase this as: SELECT 1 WHERE 12 = ANY ( SELECT tempField FROM (VALUES(11),(12),(7)) tempTable(tempField))
I rarely see these operators in the wild and might have used them in production code a couple of times if that.