This actually happened to me in a previous job. We had a fairly complex ETL solution for the most critical part of our Data Warehouse. Many SSIS packages, views, and stored procedures queried the tables that were replicas of the source system tables. One day, we found out that the source system would be deploying a new version of their database the following day. In every single table, some columns were removed, others added, and many changed data types.
There was no way that we could manually update all our SSIS packages, views, and stored procedures in less than a day. Thousands of users depended on our solution. It was too late to pause the source system changes.
That story ends up with a happy ending.
I have always used WHILE @@rowcount > 0 but you have to be careful because @@rowcount could be 0 when your while loop starts
Let’s take a look at an example. This is a simplified example without a where clause..but let’s say you have to delete several million rows from a table with many more millions of rows and the table is replicated… in that case you want to batch the deletes so that your log file doesn’t fill up, replication has a chance to catch up and in general the deletes should run faster
Click through for a couple of issues you might run into other than the obvious one of “I’m scanning the entire table with every delete” if you don’t have indexing set up right.
You can probably see by this point how the pieces are coming together: each time frame has a starting point and an ending point. If there were no overlap at all, we’d see in the fourth column a number followed by a NULL, followed by a number followed by a NULL, etc. But we clearly don’t see that: we see work item ordinals 3 and 4 share some overlap: item 3 started at 3:06:15 PM and ended after item 4’s start of 3:07:20 PM. This means that those two overlapped to some extent. Then we see two NULL values, which means they both ended before 5 began. So far so good for our developers!
Click through for a bunch of T-SQL scripts, images, and important advice about always having interns around to take the blame.
In order to show you the solution, I want to build up a reasonable sized sample. Any solution looks great when reading five records, but let’s kick that up a notch. Or, more specifically, a million notches: I’m going to use a CTE tally table and load 5 million rows.
I want some realistic looking data, so I’ve adapted Dallas Snider’s strategy to build a data set which approximates a normal distribution.
Because this is a little complicated, I wanted to take the time and explain the data load process in detail in its own post, and then apply it in the follow-up post. We’ll start with a relatively small number of records for this demonstration: 50,000. The reason is that you can generate 50K records almost instantaneously but once you start getting a couple orders of magnitude larger, things slow down some.
If you do custom data generation for lower environments, I’d recommend checking this out. Your production data probably doesn’t follow a normal distribution exactly, but a normal distribution is probably closer to reality than the uniform distribution you get with functions like RAND().
SQL Server will compile an execution plan specifically for the statement that the query hint is on. There’s some benefits, like something called “constant folding.” To us, that just means that the execution plan might be better than a normal execution plan compiled for the current statement.
It also means that the statement itself won’t be vulnerable to parameter sniffing from other queries in cache. In fact, the statement with option recompile won’t be stored in cache.
Click through for a couple of demos as well as a discussion of positives and negatives regarding its use.
There are times when I am working on a SQL script that really shouldn’t be executed all at once. Sometimes it’s a series of examples / demos for a presentation or forum answer. Other times it’s just a temporary need while I’m in the process of creating a complex script, but once the script is completed and tested then it should run all at once. In either case, I have accidentally hit F5 too many times when I thought that a certain section of code was highlighted (so only that section would execute) but in fact nothing was highlighted so the script started executing from the very top, and either ran until completion or until I was able to cancel it (if it ran long enough for me to have time to understand what was happening and hit the “cancel” button).
So I needed some way of ensuring that a script would not execute if no section was highlighted.
Read on to learn about
Now, looking at it a bit more closely you’ll see that this is a function call, not just a property. Now, in my research for this post I did find where I’d mentioned this function briefly in my somewhat comprehensive identity post. Technically I didn’t mention so much as it was mentioned to me in the comments so I added it to the list. I guess I either didn’t look at it closely enough at the time or it’s just one of those cases where I forgot. Either way, it’s worth highlighting now.
Click through to learn more.
There are times when we need to compare two tables and figure out if the data matches. I often see a number of ways of doing this suggested, most are quite slow and inefficient. I’d quite like to share a quick and slightly dirty way of doing this using the CHECKSUM and CHECKSUM_AGG functions.
Just a reminder that CHECKSUM() will generate a checksum for an entire row or selection of columns in the row.
Will generate a checksum for a dataset.
David then has a couple of examples showing these in action.
Since I was working on an example for my next performance class, I decided to use an example from that session. In the query below the qp.* part of the column list will return a column named query_plan which is an XML data type.
SELECT DISTINCT pa.,cp.,qp.*
FROM sys.dm_exec_cached_plans) cp
CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle ) qp
CROSS APPLY sys.dm_exec_plan_attributes(cp.plan_handle) As pa
WHERE is_cache_key = 1 AND pa.attribute = ‘set_options’
However, when I add the DISTINCT keyword I get the error below.
Read on for a workaround for this.
As you can see, both T1 and T2 have a numeric column (INT type in this example) called val. The challenge is to match to each row from T1 the row from T2 where the absolute difference between T2.val and T1.val is the lowest. In case of ties (multiple matching rows in T2), match the top row based on val ascending, keycol ascending order. That is, the row with the lowest value in the val column, and if you still have ties, the row with the lowest keycol value. The tiebreaker is used to guarantee determinism.
Click through for the details as well as several solutions.