Subset And Apply Problems

Tom Martens explains a class of generic data processing problems:

Subset and Apply means that I have a dataset of some rows where due to some conditions all the rows have to be put into a bucket and then a function has to be applied to each bucket.

The simple problem can be solved by a GROUP BY using T-SQL, the not so simple problem requires that all columns and rows of the dataset have to be retained for further processing, even if these columns are not used to subset or bucket the rows in your dataset.

One quick example of this is running totals of orders for each customer, which Tom answers using T-SQL, R, and Power BI.  Click through for those three solutions.

Event Handling Items

Dave Mason looks at two Connect items related to working with extended events:

One Event Behind

I another post, I wrote that the XEvents event_stream target is regularly “one event behind”. There is an existing Connect item seeking a fix to this problem: QueryableXEventData and “Watch Live Data” one event behind. If you use the “Watch Live Data” grid for XEvents in SSMS, this is an important issue and worthy of your upvote. It’s also important if you ever want to access XEvent data programmatically with C# or PowerShell because the QueryableXEventData class uses the event_stream target and is also subject to the issue.

Read on for more details.

Network Latency And Availability Groups

Anthony Nocentino discusses network latency and how that can affect Availability Groups:

Now let’s talk about Availability Group replication and network latency. Availability Groups replicate data over your network using Database Mirroring Endpoints which are TCP sockets used to move data between the primary and it’s replicas. When designing Availability Groups, we often think about things in terms of bandwidth…how much data do I need to move between my replicas. But there’s another design factor your need to consider, network latency. Why? Hint, it’s not going to have anything to do with synchronous availability mode and HADR_SYNC_COMMIT waits. Let’s talk about some fundamentals of TCP for a second.

Click through for some discussion of TCP fundamentals.

Last Log Backup Time

Kevin Hill comments on a recent Connect item by Ola Hallengren:

And his proposed solution:

Add a new column to sys.dm_db_log_space_usage or sys.database_recovery_status called LastLogBackupTime.

I LOVE this idea…back up the T-log more frequently during busy times, less often during off hours.  At my current client, there is almost nothing happening outside of a 12 hour workday window, so this would be perfect here.

Now, I am possibly misunderstanding Ola’s request or the intent…and that’s ok.  This query from the msdb..backupset table already contains this info via a relatively short amount of code:

Click through for more details as well as Ola’s Connect item.

Wanted: Named Parameters For Functions

Riley Major would like to be able to specify parameter names in function calls:

Now try this:

DECLARE	@OrderID int = NULL,	@OrderType int = 1,	@Qty int = 2,	@ServiceSpeed int = 3;
SET @OrderID = dbo.GetOrderID (@OrderType, @Qty, @ServiceSpeed);
SELECT @OrderID 'Using SET Syntax';

Now you get a NULL back from the final SELECT. What happened? If you are a careful code reviewer, you might have spotted that the function definition has the @Qty and @ServiceSpeed parameters flipped as compared to the table definition and how we’re calling the function.

But this isn’t an error. There’s no obvious indication that anything is wrong. Imagine if instead of NULL, which would probably break something, you got a different order ID back. Your program would silently continue, oblivious to what is essentially data corruption.

And if you build a function with a large number of parameters, it gets that much easier accidentally to swap just two of them.  Click through for the rest of the story, and check out Riley’s Connect item.

Wanted: Unsigned Ints

Ewald Cress would like to have unsigned integer types:

Let’s take a look at what is being asked here. Using the 32-bit integer as an example, we currently have a data type that can accept a range between negative two billion and two billion. But if negative numbers aren’t required, we can use those same 32 bits to store numbers between zero and four billion. Why, goes the question, throw away that perfectly useful upper range by always reserving space for a negative range we may not need?

I appreciate Ewald’s thoughtfulness here in working out the value of the request as well as some of the difficulties in building something which fulfills his desire.  Great read.

Indexed View Bugs

Shane O’Neill looks at a few Connect items relating to indexed views:

First contender: Inserting to an indexed view can fail

What would happen if I told you that, with regards to a view, sometimes inserting into the table could fail? Well that’s what this Connect item from Dave_Ballantyne found, along with the reason.

Click through for more bugs.

Wanted: Automatic Columns

Louis Davidson plucks an old Connect item out for a new look:

The concept is very similar to a DEFAULT constraint, with two differences:

1. Will work on an UPDATE operation, without specifying DEFAULT

2. Could be configured to disallow the user from entering a value. My proposed syntax was pretty simple:

AUTO [WITH OVERRIDE] (scalar expression)

Now I realize that 10 years ago, I didn’t take terribly long to consider that WITH was a terrible thing to add to the syntax, and AUTO is a keyword already, so I am going to rename it: AUTO_DEFAULT (scalar expression, [option]). Since I have thought a bit more about this in the years since writing it, I realized there were a few more options that would be nice. I was terrible in college doing syntax parsing, but the syntax itself is not important. Temporal in SQL Server 2016 has syntax that is similar to this for the new temporal columns which I got really excited about the first time I saw it: SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL. Maybe in vNext?

Read the whole thing.  Then check out the related Connect item Adam Machanic submitted.  I’d love to see that functionality, given how frequently I create these metadata columns.

Most-Voted Connect Items

Adam Machanic peruses the top-rated Connect items:

Magic numbers! That sounds cool. Except it’s not. It’s a horrible drain on readability and a pox on the manageability of your code. But we need them, because every database out there has “lookup tables” with specific values that we need to predicate on throughout the code base. SQL Server could solve this problem once and for all by allowing users to promote these values to named enumerators. How cool would that be?!? 220 votes, nine years.

And finally, one more from Mr. Ben-Gan, a clever idea for improving TOP with the OVER clause. I love the OVER clause. I love TOP. Their offspring would be beautiful indeed… 180 voters over the past nine years have shared in this sentiment.

This is an interesting list.

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031