Aggregates Using OVER

Kevin Feasel

2016-02-02

T-SQL

Slava Murygin shows aggregation and windowing using SUM:

As a conclusion: You CAN use “OVER” clause to do the aggregation in three following cases:
1. When data set is extremely small and fits in just one 8 Kb page;
2. When you want to hide your logic from any future developer or even yourself to make debugging and troubleshooting a nightmare;
3. When you really want to kill your SQL Server and its underlying disk system;

That conclusion’s rather pessimistic for my tastes, mostly because Slava’s trying to do the same thing with a window function that he’s doing with a GROUP BY clause and has multiple functions across different windows (including calculations).  Using SUM() OVER() is powerful when you still need the disaggregated values—for example, running totals.

Trusting Constraints

Dennes Torres talks about whether a constraint is trustworthy:

If the check constraint is trustable, it can be used by the query optimizer. For example, if the check constraint avoid values below 100 in a field and a query for 50 arrives, the query optimizer uses the check constraint to stop the query.

The query optimizer can only use the check constraint if it’s trustable, otherwise it could exist in the table records with values below 100, according to our example, and the query would loose these records.

Dennes then goes on to show how you can have non-trustworthy constraints and how to fix the issue.

Defaults Not Guaranteed Equal

Michael J. Swart shows that two DATETIME2 columns with default constraints will not necessarily show the same value upon insertion:

If I want to depend on these values being exactly the same, I can’t count on the default values.

Default constraints will fill in the correct value, but as Michael notes, “the correct value” is calculated each time.  Also, note that his results are about a millisecond off, so if you’re just using DATETIME, the frequency of observation of this occurrence will be lower, as DATETIME is only good to 3 milliseconds.  That’s not a good reason to use DATETIME, though.

Named Constraints

Louis Davidson on naming constraints:

It has long been a habit that I name my constraints, and even if it wasn’t useful for database comparisons, it just helps me to see the database structure all that much eaiser. The fact that I as I get more experience writing SQL and about SQL, I have grown to habitually format my code a certain way makes it all the more interesting to me that I had never come across this scenario to not name constraints.

Temp tables are special.  There’s another reason to have non-named constraints on temp tables inside stored procedures:  it allows for temp table reuse, as shown on slide 21 in this Eddie Wuerch slide deck from SQL Saturday 75 (incidentally, the first SQL Saturday I ever attended).

De-Duplicating Delimited Lists

Kevin Feasel

2016-01-25

T-SQL

Phil Factor looks at de-duplicating lists:

So there you have it. With XML tricks and window functions, we have more opportunity for kicking out any need for functions. To use this code, you’d just swap out the select statement that supplied my samples to the routine, for the lists that you want to deduplicate. Sure, this sort of job will never be quick because there are still correlated subqueries in there to upset the CPU! I am intrigued that there are such different ways of doing a solution for this task in SQL server. Are there yet other ways of doing it?

Cf. Aaron Bertrand’s tally table method.  Bonus points if you’re mentally screaming “CLR!”

VLFs And Temp Tables

Andy Galbraith was trying to update a script which counts VLFs and ran into problems defining his temp table:

Michelle’s code uses INSERT…EXEC to populate a temporary table with the VLF info, and the addition of this extra column breaks the original script.  Glenn’s versions of the scripts handle this issue easily since they are version-specific – in the SQL 2012/2014/2016 versions of the script, the temp table declaration is modified to include the extra RecoveryUnitID column, which allows the rest of the script to function as designed.

My problem is I wanted a version of the script that could be used across versions 2005+, and this presented a problem.  At first I tried to add an IF…ELSE block to the start of the script to handle the differing CREATE TABLE statements:

This is a good example of working around a problem rather than simply giving up.

Naming Standards

Kevin Feasel

2016-01-20

T-SQL

Aaron Bertrand talks about naming standards:

I’ve long been a proponent of not caring about which naming standards you use, but I do find it very important that your standards follow these three basic rules:

  1. The conventions make sense. You should be able to argue why the chosen convention is better than an alternative, and it can’t just be because you like it better. This doesn’t mean you have to win that argument, just that you should be arguing for something tangible.

  2. The entire team is on board. You should all agree on a standard before implementation, and any changes you make over time should be by committee.

  3. You don’t make exceptions. Even if you’re a one-person team, if you’re going to bother having a standard, it needs to be used consistently. It’s amazing how quickly exceptions can become the new rules.

If you want to talk subjectivity, I disagree with the idea that tables should be plural, as I tend to think terms of an entity (e.g., Person) which contains attributes, rather than the collection of entities which contain a specific combination of attributes.  Regardless, “set a standard and stick to it” is important advice.

Reducing Ad Hoc Query Risk

Kenneth Fisher has some tips to reduce the risk of running ad hoc queries:

  • Make sure that this is the ONLY code in your window or that you are protected by a RETURN or SET EXECUTION OFF at the top of your screen. I have this put in place by default on new query windows. This protects you from running too much code by accident.

  • Make a habit of checking what instance you are connected to before running any ad-hoc code. Running code meant for a model or test environment in production can be a very scary thing.

This is good advice.

COUNT(*) Versus COUNT(1)

Manoj Pandey asks what the difference between COUNT(*) and COUNT(1) is:

So, we can clearly and without any doubt say that both COUNT(*) & COUNT(1) are same and equivalent.

Both of these are different from COUNT(SomeColumnName), though.

Shredding XML

Tim Peters introduces us to shredding multi-level XML:

The below XML has data nested in different levels that requires the nodes method to join them together. The nodes method accepts a XML string and returns a rowset. That rowset can then be used with CROSS APPLY to effectively link your way down.

nodes (XQuery) as Table(Column)

The tabular format I need requires data from 3 different levels of this XML gob and I need to wade through 5 “tables” to get there.

Shredding XML is something you occasionally need to do.

Categories

June 2017
MTWTFSS
« May  
 1234
567891011
12131415161718
19202122232425
2627282930