Press "Enter" to skip to content

Category: T-SQL

Subset And Apply Problems

Tom Martens explains a class of generic data processing problems:

Subset and Apply means that I have a dataset of some rows where due to some conditions all the rows have to be put into a bucket and then a function has to be applied to each bucket.

The simple problem can be solved by a GROUP BY using T-SQL, the not so simple problem requires that all columns and rows of the dataset have to be retained for further processing, even if these columns are not used to subset or bucket the rows in your dataset.

One quick example of this is running totals of orders for each customer, which Tom answers using T-SQL, R, and Power BI.  Click through for those three solutions.

Comments closed

Parsing JSON In R

Tomaz Kastrun shows how to feed a JSON data set into R and turn that into a proper data frame:

JSON has very powerful statements for converting to and from JSON for storing into / from SQL Server engine (FOR JSON and JSON VALUE, etc).  And since it is gaining popularity for data exchange, I was curious to give it a try with R combination.

I will simply convert a system table into array using for json clause.

There’s an R library.  There’s always an R library.

Comments closed

Visual Studio Code

Rob Sewell shows how to run SQL Server queries using Visual Studio Code:

Reading this blog post by Shawn Melton Introduction of Visual Studio Code for DBAs reminded me that whilst I use Visual Studio Code (which I shall refer to as Code form here on) for writing PowerShell and Markdown and love how easily it interacts with Githuib I hadn’t tried T-SQL. If you are new to Code (or if you are not) go and read Shawns blog post but here are the steps I took to running T-SQL code using Code

I played around with an early version of this and my thought was that there were some nice improvements over Management Studio (like being able to filter and sort the result set grid without going back to the server), but that there are still too many nice things Management Studio does for me to take a serious look at it.  Still, I’m hopeful that Microsoft moves in the direction of having a fully-featured querying tool for Linux so I can finally join the perpetual Year of the Linux Desktop.

Comments closed

Batches And Stored Procedure Creation

Steve Jones has a warning for when you create a stored procedure:

Why is my select code in there? That was designed to be a piece of test code. Shouldn’t the BEGIN..END after the AS define my procedure?

Actually it doesn’t. the procedure doesn’t end until the CREATE PROCEDURE statement is terminated. That termination comes by ending the batch. The CREATE PROCEDURE documentation has this limitation:

The CREATE PROCEDURE statement cannot be combined with other Transact-SQL statements in a single batch.

This means that anything else you have in that batch will be considered as part of the procedure, regardless of BEGIN..END.

Judicious usage of the GO statement can help keep you out of trouble.

Comments closed

Nested Sets

Nate Johnson explains the nested sets model:

Put another way, the #3 rule is that you should always operate on the tree (CrUD ops) using stored-procedures and/or triggers that encapsulate all the nitty-gritty details of maintaining the correct position values during said insert/update/delete operations.  Of course, somebody is responsible for writing those stored-procs.  Any volunteers?  Easy now, don’t raise your hands all at once!  Generally, this responsibility falls to the DBA(s) or DBDev(s).

The problem at-hand, in my current situation, was that of “moving a sub-tree”, i.e. taking a node and all its descendants, and moving it to place it under another “parent” node.  In some models, and/or in some languages, this is a simple recursive operation.  However, SQL is not spectacular at recursion — after all, we’re working in a relational engine — so let’s try to play to its strengths:

This is a straightforward look at one of the major hierarchical models in relational design.  Well worth a look.

1 Comment

SQL Server Regex

Dev Nambi has a new open-source project:

Databases store text, and the best way to manipulate text is to use a regular expression (‘regex’). Using regular expressions in SQL queries has been possible in many database engines for decades.

Now you can use regular expressions in SQL Server queries, too. I’ve created an open-source project, sql-server-regex, that lets you run regular expressions in T-SQL queries using scalar and table-valued functions.

This is a set of CLR functions which use the built-in .NET regular expressions functionality.  That makes it pretty easy to see how the code works.

Comments closed

A T-SQL Date Dimension

Vladimir Oselsky builds a date dimension in T-SQL:

Before we get into discussing how to create it date dimension and how to use it, first let’s talk about what it is and why do we need it. Depending on who you talk to, people can refer to this concept as “Calendar table” or “Date Dimension,” which is usually found in Data Warehouse. No matter how it is called, at the end of the day, it is a table in SQL Server which is populated with different date/calendar related information to help speed up SQL queries which require specific parts of dates.

In my case, I have created it to be able to aggregate data by quarters, years and month. Depending on how large your requirements are it will add additional complexity to building it. Since I don’t care about holidays (for now at least), I will not be creating holiday schedule which can be complicated to populate.

I love date dimensions, even on non-warehouse databases, because it’s an easy way of providing additional context to time series data.  Think about graphing orders per day in an industry with weekday-versus-weekend trends; a date dimension lets you strip out weekends (maybe plotting them separately) or even lets you build day-of-week analysis for each day, or looking at week of the month, etc.  You might also be interested in computing holidays.

Comments closed

STRING_AGG() Performance

Aaron Bertrand wants to know how the STRING_AGG() function performs:

We can see that our FORCESCAN hint really did make things worse – while we shifted the cost away from the clustered index seek, the sort was actually much worse, even though the estimated costs deemed them relatively equivalent. More importantly, we can see that STRING_AGG() does offer a performance benefit, whether or not the concatenated strings need to be ordered in a specific way. As with STRING_SPLIT(), which I looked at back in March, I am quite impressed that this function scales well prior to “v1.”

Given that the early releases tend to be “get the thing working” and later CTPs are around “make the thing faster,” it’s nice to see that STRING_AGG() is already ready for prime-time, and makes me wonder if they’ll make it even faster by RTM.

Comments closed

When Is NOLOCK Okay?

Erik Darling wants to figure out the acceptable boundaries for NOLOCK:

Picture a couple developers who started their app in the cloud, where they can’t get fancy with tempdb, fast disks aren’t in the budget yet, along with that beefier server with some extra RAM. They may not be able to turn on RCSI or SI at the drop of a hat; tempdb would keel over with the row versioning as part of a workload that already uses it pretty heavily.

They still need to run reports, either for users, or for higher ups at the company, and they can ask for them at any time. Caching the data when user activity is low and reporting against it when someone asks may raise some questions, like “why doesn’t my sales data show anything from today?”, or worse. You could invalidate the cache every X minutes, but that doesn’t help because then you need to re-run that reporting query every X minutes. That’s only moderately better than letting users query it at will.

Figuring out when read uncommitted is acceptable is a business decision.  As much as I dislike using NOLOCK, as long as the people on the business side understand the risks, it’s their call.

Comments closed

Pivoting Data

Jana Sattainathan explains the PIVOT operator:

The results are so much easier to look at and comprehend, aren’t they? All object types for a schema are on a single line and it is easy for us to do impact analysis visually.

Sometimes doing it in T-SQL is the best approach, but pivoting is generally something which is cheaper in the application tier, whether you’re building a report, dashboard, or web app.

Comments closed