Category: T-SQL

Time Slicing T-SQL

Published 2017-07-31 by Kevin Feasel

Bill Fellows has some fancy window function footwork to split data into time slices:

That looks like a lot, but it really isn’t. Starting from the first inner most query, we select the top 24 rows from sys.all_objects and use the ROW_NUMBER function to generate us a monotonically increasing set of values, thus 1…24. However, since the allowable range of hours is 0 to 23, I deduct one from this value (A). I repeat this pattern to generate minutes (B) except we get the top 60. Since I want 15 second intervals, for the seconds query, I only get the top 4 values. I deduct one so we have {0,1,2,3} and then multiply by 15 to get my increments (C). If you want different time slices, that’s how I would modify this pattern.

Only works on 2012 or later, but it’s a fancy way of slicing into 15-second (or however you define it) chunks.

Comments closed

Prioritizing Rows In A Union

Published 2017-07-26 by Kevin Feasel

Daniel Hutmacher shows several ways to combine data from multiple sources, prioritized by source:

For the sake of simplicity, I’ll assume that our example data has a clustered index on the “primary key” that we want to use to determine which rows have already been included in the set – in our example, the primary key is (FirstName, LastName). The data I’m using is a fictional example, but here’s the jist:

#Employees has about 33 000 rows.

#Customers has about 44 000 rows.

#Passengers has about 500 000 rows.

The data is constructed in a way that these queries should return 530 000 rows, so we’re looking at some overlap but far from totally overlapping rows.

Example #492,806 that T-SQL is not a true 4th Generation Language, that how you write the query can greatly matter for performance.

Comments closed

Reading Drive Files Recursively From T-SQL

Published 2017-07-25 by Kevin Feasel

Slava Murygin shows how to use a common table expression to read a recursive drive listing:

In this blog I’ll use undocumented in SQL Server extended stored procedure “xp_dirtree“.
That procedure is well described in Patrick Keisler’s blog.

Briefly, “xp_dirtree” extended procedure returns tree of sub-directories from given folder and has three parameters:
1. Starting or root folder
2. Depth level – determines how deep SQL Server will dig into a sub-directory structure. If it set to zero or omitted whole tree will be returned.
3. If omitted or set to zero will return only folders .If set to “1” will return files as well.

If you want to be really mean to somebody, create a shortcut to the parent directory within a directory.

Comments closed

Batch Execution Mode And Window Functions

Published 2017-07-24 by Kevin Feasel

Chris Adkin shows how taking advantage of batch execution mode on rowstore tables can lead to faster performance as degree of parallelism increases:

The SQL Server execution engine fundamentally acts like a cursor, control flow is exerted from the root node of the plan down to right most child node or iterators. The (logical) flow of data through the plan is in the opposite direction.

I say ‘Logical’ because in practise the run time uses buffers in order to minimise the data movement whilst executing the plan. However, up until SQL Server 2012, which first introduced batch mode, execution plans were executed by iterators processing data in a row by agonising row manner. Batch mode changes all of this, if we can ferry rows around in batches, this reduces the number of CPU cycles it takes to process an iterator in the plan (providing it supports batch mode). Also by sizing batches such that they fit inside the level 2 cache of the CPU, we gain even more performance by minimizing CPU last level cache misses or worse still main memory.

Adkin credits Niko Neugebauer for the insight and shows how you can use this on normal rowstore tables.

Comments closed

Dealing With Newlines In Reports

Published 2017-07-20 by Kevin Feasel

Shane O’Neill shows how to deal with newlines in data:

I have mentioned before that we can use CHAR(10) and CHAR(13) for new lines and carriage returns in SQL Server so I’ll leave it up to an exercise to the reader to create a table with these “troublesome” bits of information in them (plus if you came here from Google, I assume you already have a table with them in it).

For me, I’ve just created a single table dbo.NewLineNotes that has a single entry with a new line in it.

Read on for more.

Comments closed

Running Totals With Window Functions

Published 2017-07-19 by Kevin Feasel

Bert Wagner shows the best method to calculate a running total in SQL Server 2012 or later:

Before SQL Server 2012, the solution to generating a running total involved cursors, CTEs, nested subqueries, or cross applies. This StackOverflow thread has a variety of solutions if you need to solve this problem in an older version of SQL Server.

However, SQL Server 2012’s introduction of window functions makes creating a running total incredibly easy.

Enhanced window functions was one of 2012’s killer features on the T-SQL developer side. Bert’s post doesn’t cover window ranges and sizes, as the defaults work for him, but Steve Stedman has a good post on the topic if you want more details.

Comments closed

Pointers Can Be Sharp

Published 2017-07-11 by Kevin Feasel

Rob Farley describes a bad day he had:

There was a guy who needed to get his timesheets in. It wasn’t me – I just thought I could help …by making a copy of his timesheets in a separate table, so that he could prepare them there instead of having to use the clunky Access form. I’d gone into the shared Access file that people were using, made a copy of it, and then proceeded to clear out all the data that wasn’t about him, so that he could get his data ready. I figured once he was done, I’d just drop his data in amongst everyone else’s – and that would be okay.

Except that right after I’d cleared out everyone else’s data, everyone else started to complain that their data wasn’t there.

Heart-rate increased. I checked that I was using the copy, not the original… I closed it, opened the original, and saw that sure enough, only his data was there. Everyone else’s (including my own) data was gone.

As far as “oops” moments go, yeah, this is definitely on the list. Read on for Rob’s explanation of what happened.

Comments closed

Dynamic Unpivoting For Change Detection

Published 2017-07-07 by Kevin Feasel

Shane O’Neill has a script that dynamically unpivots a pair of rows and compares values column by column, storing the changes in XML:

Overall, the script is longer at nearly double the lines but where it shines is when adding new columns.
To include new columns, just add them to the table; to exclude them, just add in a filter clause.

So, potentially, if every column in this table is to be tracked and we add columns all the way up to 1,024 columns, this code will not increase.
Old way: at least 6,144.
New way: at least 2,048.
Dynamic: no change

Read on for that script. Even though his developer ended up not using his solution, Shane has made it available for the rest of the world so that some day, someone else can have the maintenance nightmare of trying to root out a bug in the process.

Comments closed

Mann-Whitney U Test in SQL

Published 2017-07-03 by Kevin Feasel

Phil Factor continues his Statistics in SQL series with the Mann-Whitney U test:

There are several ways that you can test this, but nobody is going to argue with you if you use a Mann–Whitney U test to test whether two samples come from the same distribution. It doesn’t require that the data has any particular type of distribution. It just requires that each observation is done by a different member of the population so that all the observations from both groups are independent of each other. It is really just a test of differences in mean-rank between two populations’ pooled ranking. To test this difference It has to be possible to compare any of the observations with any of the others and say which of the two are greater. Your objective is to disprove the assumption that The distributions of both populations are equal. Calculating a measure of the difference is simple, and was designed to be done easily by hand before computers. The probability that the observed difference occurred by chance is easily calculated for large samples because U then approximates to the normal distribution, but it is complex for small samples. Here, we have a small sample and are just interested in whether the two-tailed test is signifcant at the five percent level so we dodge the bullet by using a significance lookup table for the critical value of U.

Read on for Phil’s implementation of the test.

Comments closed

Batched Data Archival

Published 2017-07-03 by Kevin Feasel

Anders Pedersen has a script to archive data in smaller batches:

Seeing as the data had to be retrievable for any date, I could not simply delete the very old data. These tables also had constant inserts and updates into them, so making sure the tables remained available became important, i.e. needed to have acceptable time that the table was being locked, with time for waiting transactions to finish.

The solution I came up with does this with variable size batches. Now, with modern versions of SQL, there are other ways to do this, but the good thing about this method it works regardless of version of SQL, as well as edition. Azure SQL DB would need some modification to make it work to archive to a separate database.

Click through for the script.

Comments closed