Subset and Apply means that I have a dataset of some rows where due to some conditions all the rows have to be put into a bucket and then a function has to be applied to each bucket.
The simple problem can be solved by a GROUP BY using T-SQL, the not so simple problem requires that all columns and rows of the dataset have to be retained for further processing, even if these columns are not used to subset or bucket the rows in your dataset.
One quick example of this is running totals of orders for each customer, which Tom answers using T-SQL, R, and Power BI. Click through for those three solutions.
One Event Behind
I another post, I wrote that the XEvents event_stream target is regularly “one event behind”. There is an existing Connect item seeking a fix to this problem: QueryableXEventData and “Watch Live Data” one event behind. If you use the “Watch Live Data” grid for XEvents in SSMS, this is an important issue and worthy of your upvote. It’s also important if you ever want to access XEvent data programmatically with C# or PowerShell because the QueryableXEventData class uses the event_stream target and is also subject to the issue.
Read on for more details.
Now let’s talk about Availability Group replication and network latency. Availability Groups replicate data over your network using Database Mirroring Endpoints which are TCP sockets used to move data between the primary and it’s replicas. When designing Availability Groups, we often think about things in terms of bandwidth…how much data do I need to move between my replicas. But there’s another design factor your need to consider, network latency. Why? Hint, it’s not going to have anything to do with synchronous availability mode and HADR_SYNC_COMMIT waits. Let’s talk about some fundamentals of TCP for a second.
Click through for some discussion of TCP fundamentals.
And his proposed solution:Add a new column to sys.dm_db_log_space_usage or sys.database_recovery_status called LastLogBackupTime.
I LOVE this idea…back up the T-log more frequently during busy times, less often during off hours. At my current client, there is almost nothing happening outside of a 12 hour workday window, so this would be perfect here.
Now, I am possibly misunderstanding Ola’s request or the intent…and that’s ok. This query from the msdb..backupset table already contains this info via a relatively short amount of code:
Click through for more details as well as Ola’s Connect item.
Now try this:DECLARE @OrderID int = NULL, @OrderType int = 1, @Qty int = 2, @ServiceSpeed int = 3; SET @OrderID = dbo.GetOrderID (@OrderType, @Qty, @ServiceSpeed); SELECT @OrderID 'Using SET Syntax';
Now you get a NULL back from the final SELECT. What happened? If you are a careful code reviewer, you might have spotted that the function definition has the @Qty and @ServiceSpeed parameters flipped as compared to the table definition and how we’re calling the function.
But this isn’t an error. There’s no obvious indication that anything is wrong. Imagine if instead of NULL, which would probably break something, you got a different order ID back. Your program would silently continue, oblivious to what is essentially data corruption.
And if you build a function with a large number of parameters, it gets that much easier accidentally to swap just two of them. Click through for the rest of the story, and check out Riley’s Connect item.
Let’s take a look at what is being asked here. Using the 32-bit integer as an example, we currently have a data type that can accept a range between negative two billion and two billion. But if negative numbers aren’t required, we can use those same 32 bits to store numbers between zero and four billion. Why, goes the question, throw away that perfectly useful upper range by always reserving space for a negative range we may not need?
I appreciate Ewald’s thoughtfulness here in working out the value of the request as well as some of the difficulties in building something which fulfills his desire. Great read.
First contender: Inserting to an indexed view can fail
What would happen if I told you that, with regards to a view, sometimes inserting into the table could fail? Well that’s what this Connect item from Dave_Ballantyne found, along with the reason.
Click through for more bugs.
The concept is very similar to a DEFAULT constraint, with two differences:
1. Will work on an UPDATE operation, without specifying DEFAULT
2. Could be configured to disallow the user from entering a value. My proposed syntax was pretty simple:
AUTO [WITH OVERRIDE] (scalar expression)
Now I realize that 10 years ago, I didn’t take terribly long to consider that WITH was a terrible thing to add to the syntax, and AUTO is a keyword already, so I am going to rename it: AUTO_DEFAULT (scalar expression, [option]). Since I have thought a bit more about this in the years since writing it, I realized there were a few more options that would be nice. I was terrible in college doing syntax parsing, but the syntax itself is not important. Temporal in SQL Server 2016 has syntax that is similar to this for the new temporal columns which I got really excited about the first time I saw it: SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL. Maybe in vNext?
Read the whole thing. Then check out the related Connect item Adam Machanic submitted. I’d love to see that functionality, given how frequently I create these metadata columns.
Magic numbers! That sounds cool. Except it’s not. It’s a horrible drain on readability and a pox on the manageability of your code. But we need them, because every database out there has “lookup tables” with specific values that we need to predicate on throughout the code base. SQL Server could solve this problem once and for all by allowing users to promote these values to named enumerators. How cool would that be?!? 220 votes, nine years.
And finally, one more from Mr. Ben-Gan, a clever idea for improving TOP with the OVER clause. I love the OVER clause. I love TOP. Their offspring would be beautiful indeed… 180 voters over the past nine years have shared in this sentiment.
This is an interesting list.
I opened it up, and sure enough, no sign of that 7,276 value. It looks just the same as the estimated plan I just showed.
Getting plans out of the cache is where the estimated values come into their own. It’s not just that I’d prefer to not actually run potentially-expensive queries on customer databases. Querying the plan cache is one thing, but running queries to get the actuals – that’s a lot harder.
With SQL 2016 SP1 installed, thanks to that Connect item, I can now see the Estimated Number of Rows to be Read property in estimated plans, and in the plan cache. The operator tooltip shown here is taken from the cache, and I can easily see that Estimated property showing 7,276. This is shown from Management Studio because Plan Explorer doesn’t yet call out this property explicitly:
If you’re looking to use SQL Server 2016 SP1, read the whole thing; this will make query tuning without running those horribly expensive queries a bit easier.