Press "Enter" to skip to content

Category: T-SQL

DATEADD With SYSUTCDATETIME

Aaron Bertrand blogs on an estimation failure with DATEADD and SYSDATETIME/SYSUTCDATETIME:

Essentially, the problem is that a poor estimate can be made not simply when SYSDATETIME() (or SYSUTCDATETIME()) appears, as Erland originally reported, but when any datetime2expression is involved in the predicate (and perhaps only when DATEADD() is also used). And it can go both ways – if we swap >= for <=, the estimate becomes the whole table, so it seems that the optimizer is looking at the SYSDATETIME() value as a constant, and completely ignoring any operations like DATEADD() that are performed against it.

Paul shared that the workaround is simply to use a datetime equivalent when calculating the date, before converting it to the proper data type. In this case, we can swap outSYSUTCDATETIME() and change it to GETUTCDATE()

I suppose switching to GETUTCDATE isn’t too much of a loss, but it looks like (according to Paul White in the second linked Connect item) this appears to have been fixed in SQL Server 2014.

Comments closed

Getting The Last Row Per Group

Daniel Hutmacher wants to get the last element in each group (for example, the current records in a type-two dimension):

The CROSS APPLY and the old-school solutions are by far the best choice for dense indexes, i.e. when the first column has a low degree of uniqueness. The old-school solution is only that fast because the optimizer short-circuits the query plan.

LEAD() and the old school strategy are best for selective indexes, i.e. when the first column is highly unique.

There’s a nice set of options available so if one doesn’t work well with your particular data set, try out some of the others and see if they work for you.

Comments closed

Keep Check Constraints Simple

Erik Darling shows performance implications around having scalar UDFs in check constraints:

Really. Every single time. It started off kind of funny. Scalar functions in queries: no parallelism. Scalar functions in computed columns: no parallelism, even if you’re not selecting the computed column. Every time I think of a place where someone could stick a scalar function into some SQL, it ends up killing parallelism. Now it’s just sad.

This is (hopefully. HOPEFULLY.) a less common scenario, since uh… I know most of you aren’t actually using any constraints. So there’s that! Developer laziness might be a saving grace here. But if you read the title, you know what’s coming. Here’s a quick example.

Yeah, UDFs in check constraints is a pretty bad idea most of the time.

Comments closed

Specify Schema

Kenneth Fisher warns that you should specify schemas in scripts:

But why? I mean the table gets created either way and since the default is dbothere is no real reason to name it.

Actually no. The default is not in fact dbo. It frequently is dbo but by no means always. The default schema is part of your USER information. Specifically theDEFAULT_SCHEMA option. Well, unless you are a sysadmin. Then it actually does always default to dbo.

Schemas are a very powerful grouping mechanism, and they’ve been around long enough that if you aren’t taking full advantage of them, you really should.

Comments closed

Views Aren’t Tables

Grant Fritchey looks at how the query optimizer treats views:

The important point to note is that the optimizer is absolutely not treating the view like a table. The optimizer is treating the view like a query, which is all it is. This has both positive and negative impacts when it comes to query performance tuning and this view. You could spend all sorts of time “tuning” the view, only to find all that tuning you’ve done tossed out the window when the query doesn’t reference a column in the view and that causes the optimizer to rearrange the plan. I don’t want to convey that this is an issue. It’s not. I’m just trying to emphasize the point that a view is just a query.

In a subsequent post, Grant promises to talk about the potential perils of nested views.  That’s where people start running into trouble, when a nested view gets to be so complex that the query optimizer gives up and takes it literally.

Comments closed

Conditional Ordering

Kenneth Fisher shows various methods for conditional ordering:

Simple sorts are pretty easy. In the ORDER BY clause you can list out the columns by name or by position in the field list and determine if you want them sorted ascending or descending. But how about going beyond that?

Every now and again you need a fairly complex sort order. Say for example you want to order your data differently depending on the Status (say DueDate if theStatus is 1, 2 or 3 and ShipDate otherwise). But you also want anyone with aTotalDue greater than $5000 sorted separately from those with a TotalDue less than $5000.

Conditional sorting isn’t something you should be doing in every query, but it’s important enough to know about.

Comments closed

Circular References

Daniel Hutmacher looks at circular references in recursive CTEs:

Suppose you have a tree hierarchy where (among other members) “3” is the parent of “8”, “8” is the parent of “B” and “B” is the parent of “E”. You could easily draw this as a branch structure where the members could be profit centers of a company, divisions of government, managers and employees, product lines, cell references in an Excel sheet or pretty much anything that can be described as a hierarchy.

3
--8
  --B
    --E

Now, if we say that “E” is the parent of “3”, we’ve created a circular reference, and we end up with an infinite recursion. What that means is that if you follow the tree from the root to the leaf level, you’ll end up going round and round in circles. In terms of a database query, that means that the query will go on forever until it either fills up your log file or tempdb, or until the maximum number of recursions (OPTION MAXRECURSION) is reached, whichever happens first.

I did something similar a while back with foreign key dependencies.  Because foreign key relationships are potentially cyclical graphs, they can easily fall into the same scenario.

Comments closed

Using The OUTPUT Clause

Steve Jones shows how to use the OUTPUT clause with an INSERT statement:

I got asked a question about the OUTPUT clause recently and realized I didn’t remember the syntax. I’ve rarely used this, so I had to look it up and thought this would be a good basic post.

The idea with OUTPUT is that the data from the inserted and deleted tables can be output from the INSERT statement, outside of your triggers. This is the same data, but you can access it in the insert.

The format is

INSERT xxx OUTPUT yyyy INTO @zzz VALUES (or SELECT) mmmm

If I had one thing I could change about OUTPUT, I’d like to be able to output directly into variables for those cases in which I know I’m only going to get one result (or maybe I only care about one arbitrary result in a set).

Comments closed

String_Split Performance

Aaron Bertrand looks into how STRING_SPLIT performs compared to other string splitting methods:

So with those limitations out in the open, we can move on to some performance testing. Given Microsoft’s track record with built-in functions that leverage CLR under the covers (coughFORMAT() cough), I was skeptical about whether this new function could come close to the fastest methods I’d tested to date.

Let’s use string splitters to separate comma-separated strings of numbers, this way our new friend JSON can come along and play too. And we’ll say that no list can exceed 8,000 characters, so no MAX types are required, and since they’re numbers, we don’t have to deal with anything exotic like Unicode.

The results are surprising.  I expected it to be somewhere around CLR-level, but not way better.

Comments closed