Another quick post of simple changes you can make to your code to create more optimal execution plans. This one is on implicit conversions. An implicit conversion is when SQL Server must automatically convert a data type from one type to another when comparing values, moving data or combining values with other values. When these values are converted, during the query process, it adds additional overhead and impacts performance.
Read on for more info, including a common scenario where implicit conversion causes performance degradation.
Let’s start with this example:
SELECT 4.4/CAST(.44 AS VARCHAR(5))
Ignoring for a moment that our denominator is of type VARCHAR, if we do some quick mental math or use a calculator, we can see that the answer should be 10:
Spoilers: it isn’t 10, and Bert explains why.
The OLTP table implements a
rowversioncolumn that is automatically updated whenever a row is updated or inserted. The rowversion number is unique at the database level, and increments monotonically for all transactions that take place within the context of that database. The
dbo.OLTP_Updatestable is used to store the minimum row version available inside the transaction used to copy data from the OLTP table into the OLAP table. Each time this code runs it captures incremental changes. This is far more efficient than comparing all the rows in both tables using a hashing function since this method doesn’t require reading any data other than the source data that is either new, or has changed.
I think this is the first time I’ve seen someone use
ROWVERSION types successfully.
That’s a lot of desired memory, 1,493,120 KB aka 1.4 GB, but there was only
25 MB used in the sort. So why was SQL Server so far off the right estimate? It’s the data types we picked.
That’s a lot of memory for a fairly simple query returning 300,000 rows, each containing a string and an int.
There are certain design patterns in T-SQL that give me pause. They may not be “code smells” per se, but when I encounter them I find myself thinking “there’s got to be a more sensible way to accomplish this”. WAITFOR DELAY is one example. I’ve used it a few times here and there, mostly in one-off scripts. If I wrote some code that used it regularly in production, I’d be paranoid about putting my SPID to sleep forever. Maybe a little bit of paranoia is a good thing. But I digress.
A recent task found its way to me, and I’ve decided to use WAITFOR DELAY as part of my solution. (It hasn’t been tested or implemented yet–perhaps more on this in another post.) My usage this time has been more complex than in the past. What I already knew is that you can use a string literal for the time_to_pass argument. For example, this will delay for 3½ seconds:
WAITFOR DELAY '00:00:03.500'
Click through for a bunch of testing.
Let’s start with what we are told about this new feature. According to the documentation, the new UTF-8 Collations:
can be used …
- as a database-level default Collation
- as a column-level Collation
- by appending “
_UTF8” to the end of any Supplementary Character-Aware Collation (i.e. either having “
_SC” in their name, or being of level
- with only the
(implied) have no effect on
NVARCHARdata (meaning: for these types, the UTF-8 Collations behave the same as their non-UTF-8 equivalents
“This feature may provide significant storage savings, depending on the character set in use.” (emphasis mine)
Solomon takes his normal, thorough approach to the problem and finds several issues.
Why are the salaries stored as nvarchar and formatted with commas, spaces, and periods?
Great question! Someone wanted to make sure these amounts would look good in the UI so storing the formatted values in the database would be the way to go…
Pretty for the UI, not really great for needing to do analysis on.
TRY_PARSE is slow, but for a one-time conversion as you try to fix a bad idea, it’s fine.
The time zone name is taken from a list maintained in the following Windows registry hive:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time Zones
Note: For SQL Server on Linux (and Docker containers), a registry shim performs the same function as the Windows registry by intercepting the API calls and returning the expected value(s).
We can also use a Transact-SQL (T-SQL) query against the system view
sys.time_zone_info, which uses the information from the registry hive. This is the recommended method if you do not have access to the registry hive.
Click through for a couple of examples.
This week’s Quizletter featured a quiz on using the AVG() function in SQL Server.
I was inspired to write this quiz because I’ve been teaching TSQL School each week. When we covered aggregate functions, I remembered just how tricksy and non-average that AVG() can be.
Want to take the quiz before you read the answers?
Head over here and take the quiz first.
Or read this post and then take the quiz, if you prefer. I’m not gonna call it cheating, because it’s learning either way.
Kendra explains each of the answers, so I’d recommend taking the quiz first.
Now here’s something we didn’t cover in the video and is especially helpful if something just WILL NOT work and you’ve spent all morning panic eating biscuits.
You can write checks to see if something is numeric, or an integer, with
The general “‘is.XXXXX()’” function will take many of the data types we cover here and more, and can be a real time/life saver.
We could also use
class()here and inspect the result.^[You might recall that
class(1)had the result of “numeric” – R was not by default considering 1 as an integer for the purpose of the
class()function. ### Special numbers As well as
ito denote imaginary numbers, there are some additional symbols you might encounter or want to use.
There’s a video as well as a full blog post.