Press "Enter" to skip to content

Category: T-SQL

Modulo

Kenneth Fisher walks us through modulus division in SQL Server:

Now aside from the odd occasion when you actually need it for it’s simple purpose it’s a rather interesting way to get a rolling count. Basically you can use it to get a list back of 1,2,3,…n-1,0 where n is your divisor.

There are a few great use cases for modulo within SQL Server.  One not mentioned is building test data.  You can easily build a uniformly distributed set of randomized numeric values within a particular range using modulo math.

Comments closed

DATEADD With SYSUTCDATETIME

Aaron Bertrand blogs on an estimation failure with DATEADD and SYSDATETIME/SYSUTCDATETIME:

Essentially, the problem is that a poor estimate can be made not simply when SYSDATETIME() (or SYSUTCDATETIME()) appears, as Erland originally reported, but when any datetime2expression is involved in the predicate (and perhaps only when DATEADD() is also used). And it can go both ways – if we swap >= for <=, the estimate becomes the whole table, so it seems that the optimizer is looking at the SYSDATETIME() value as a constant, and completely ignoring any operations like DATEADD() that are performed against it.

Paul shared that the workaround is simply to use a datetime equivalent when calculating the date, before converting it to the proper data type. In this case, we can swap outSYSUTCDATETIME() and change it to GETUTCDATE()

I suppose switching to GETUTCDATE isn’t too much of a loss, but it looks like (according to Paul White in the second linked Connect item) this appears to have been fixed in SQL Server 2014.

Comments closed

Getting The Last Row Per Group

Daniel Hutmacher wants to get the last element in each group (for example, the current records in a type-two dimension):

The CROSS APPLY and the old-school solutions are by far the best choice for dense indexes, i.e. when the first column has a low degree of uniqueness. The old-school solution is only that fast because the optimizer short-circuits the query plan.

LEAD() and the old school strategy are best for selective indexes, i.e. when the first column is highly unique.

There’s a nice set of options available so if one doesn’t work well with your particular data set, try out some of the others and see if they work for you.

Comments closed

Keep Check Constraints Simple

Erik Darling shows performance implications around having scalar UDFs in check constraints:

Really. Every single time. It started off kind of funny. Scalar functions in queries: no parallelism. Scalar functions in computed columns: no parallelism, even if you’re not selecting the computed column. Every time I think of a place where someone could stick a scalar function into some SQL, it ends up killing parallelism. Now it’s just sad.

This is (hopefully. HOPEFULLY.) a less common scenario, since uh… I know most of you aren’t actually using any constraints. So there’s that! Developer laziness might be a saving grace here. But if you read the title, you know what’s coming. Here’s a quick example.

Yeah, UDFs in check constraints is a pretty bad idea most of the time.

Comments closed

Specify Schema

Kenneth Fisher warns that you should specify schemas in scripts:

But why? I mean the table gets created either way and since the default is dbothere is no real reason to name it.

Actually no. The default is not in fact dbo. It frequently is dbo but by no means always. The default schema is part of your USER information. Specifically theDEFAULT_SCHEMA option. Well, unless you are a sysadmin. Then it actually does always default to dbo.

Schemas are a very powerful grouping mechanism, and they’ve been around long enough that if you aren’t taking full advantage of them, you really should.

Comments closed

Views Aren’t Tables

Grant Fritchey looks at how the query optimizer treats views:

The important point to note is that the optimizer is absolutely not treating the view like a table. The optimizer is treating the view like a query, which is all it is. This has both positive and negative impacts when it comes to query performance tuning and this view. You could spend all sorts of time “tuning” the view, only to find all that tuning you’ve done tossed out the window when the query doesn’t reference a column in the view and that causes the optimizer to rearrange the plan. I don’t want to convey that this is an issue. It’s not. I’m just trying to emphasize the point that a view is just a query.

In a subsequent post, Grant promises to talk about the potential perils of nested views.  That’s where people start running into trouble, when a nested view gets to be so complex that the query optimizer gives up and takes it literally.

Comments closed

Conditional Ordering

Kenneth Fisher shows various methods for conditional ordering:

Simple sorts are pretty easy. In the ORDER BY clause you can list out the columns by name or by position in the field list and determine if you want them sorted ascending or descending. But how about going beyond that?

Every now and again you need a fairly complex sort order. Say for example you want to order your data differently depending on the Status (say DueDate if theStatus is 1, 2 or 3 and ShipDate otherwise). But you also want anyone with aTotalDue greater than $5000 sorted separately from those with a TotalDue less than $5000.

Conditional sorting isn’t something you should be doing in every query, but it’s important enough to know about.

Comments closed

Circular References

Daniel Hutmacher looks at circular references in recursive CTEs:

Suppose you have a tree hierarchy where (among other members) “3” is the parent of “8”, “8” is the parent of “B” and “B” is the parent of “E”. You could easily draw this as a branch structure where the members could be profit centers of a company, divisions of government, managers and employees, product lines, cell references in an Excel sheet or pretty much anything that can be described as a hierarchy.

3
--8
  --B
    --E

Now, if we say that “E” is the parent of “3”, we’ve created a circular reference, and we end up with an infinite recursion. What that means is that if you follow the tree from the root to the leaf level, you’ll end up going round and round in circles. In terms of a database query, that means that the query will go on forever until it either fills up your log file or tempdb, or until the maximum number of recursions (OPTION MAXRECURSION) is reached, whichever happens first.

I did something similar a while back with foreign key dependencies.  Because foreign key relationships are potentially cyclical graphs, they can easily fall into the same scenario.

Comments closed