Making plan choices with IF branches like this plain doesn’t work.
The optimizer compiles a plan for both branches based on the initial compile value.
What you end up with is a stored proc that doesn’t do what it’s supposed to do, and parameter sniffing times two.
Read on to see an example of this. If you really, really want to use an IF block, you could separate the components out into individual stored procedures and call those stored procedures independently.
So, when specifying a “new_reseed_value“, the possible scenarios covered are:
1. Rows exist
2. No rows due to none inserted since the table was created
3. No rows due to
What’s missing? The following scenario:
No rows due to
Click through to see how
DBCC CHECKIDENT behaves differently depending upon the scenario.
— so, before SWITCHOFFSET existed, …
SELECT SWITCHOFFSET(SYSDATETIMEOFFSET(),'-05:00') AS [EST the easy way], TODATETIMEOFFSET(DATEADD(HOUR, -5, SYSDATETIMEOFFSET()), '-05:00') AS [EST the hard way]
— so, thinking of a DATETIMEOFFSET data type as a complex object
— with many different parts: year, month, day, hour, time zone, etc.
— it looks like SWITCHOFFSET changes two things: time zone and hour
This was an interesting video. I typically think entirely in UTC and let the calling application convert to time zones as needed, but if that’s not an option for you, knowing about
SWITCHOFFSET() is valuable.
SQL Server can be configured to provide certain behavior to client sessions, via the
SET OPTIONS=command, or via the
sys.sp_configure 'user options'system stored procedure. The
SET OPTIONS=command only affects the current session, whereas the
sys.sp_configure 'user options'system stored procedure configures the default values for all future user sessions.
Click through for the script, which even contains a quick description of each option.
Notice a difference? In the first query, I have SupplierID, but the second is SupplierId, with a lower case “d”. These keys are determined when you use sp_set_session_context, which takes a sysname value for the key. These are going to be case sensitive, as each one is a different identifier.
Click through for the full story. I’m not a fan of case-sensitivity in general and especially not in a system where, by default, most things are case-insensitive.
Next query, for median calculation was a window function query.
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY c1)
OVER (PARTITION BY (SELECT 1)) AS MedianCont
To my surprise, the performance was even worse, and at this time, I have to say, I was running this on SQL Server 2017 with CU7. But luckily, I had a SQL Server 2019 CTP 2.0 also installed and here, with no further optimization the query ran little over 1 second.
I’ve warned people away from this function for all but tiny data sets because of how poorly it performs. With SQL Server 2019, I might be able to recommend it.
This actually happened to me in a previous job. We had a fairly complex ETL solution for the most critical part of our Data Warehouse. Many SSIS packages, views, and stored procedures queried the tables that were replicas of the source system tables. One day, we found out that the source system would be deploying a new version of their database the following day. In every single table, some columns were removed, others added, and many changed data types.
There was no way that we could manually update all our SSIS packages, views, and stored procedures in less than a day. Thousands of users depended on our solution. It was too late to pause the source system changes.
That story ends up with a happy ending.
I have always used WHILE @@rowcount > 0 but you have to be careful because @@rowcount could be 0 when your while loop starts
Let’s take a look at an example. This is a simplified example without a where clause..but let’s say you have to delete several million rows from a table with many more millions of rows and the table is replicated… in that case you want to batch the deletes so that your log file doesn’t fill up, replication has a chance to catch up and in general the deletes should run faster
Click through for a couple of issues you might run into other than the obvious one of “I’m scanning the entire table with every delete” if you don’t have indexing set up right.
You can probably see by this point how the pieces are coming together: each time frame has a starting point and an ending point. If there were no overlap at all, we’d see in the fourth column a number followed by a NULL, followed by a number followed by a NULL, etc. But we clearly don’t see that: we see work item ordinals 3 and 4 share some overlap: item 3 started at 3:06:15 PM and ended after item 4’s start of 3:07:20 PM. This means that those two overlapped to some extent. Then we see two NULL values, which means they both ended before 5 began. So far so good for our developers!
Click through for a bunch of T-SQL scripts, images, and important advice about always having interns around to take the blame.
In order to show you the solution, I want to build up a reasonable sized sample. Any solution looks great when reading five records, but let’s kick that up a notch. Or, more specifically, a million notches: I’m going to use a CTE tally table and load 5 million rows.
I want some realistic looking data, so I’ve adapted Dallas Snider’s strategy to build a data set which approximates a normal distribution.
Because this is a little complicated, I wanted to take the time and explain the data load process in detail in its own post, and then apply it in the follow-up post. We’ll start with a relatively small number of records for this demonstration: 50,000. The reason is that you can generate 50K records almost instantaneously but once you start getting a couple orders of magnitude larger, things slow down some.
If you do custom data generation for lower environments, I’d recommend checking this out. Your production data probably doesn’t follow a normal distribution exactly, but a normal distribution is probably closer to reality than the uniform distribution you get with functions like RAND().