Press "Enter" to skip to content

Category: T-SQL

Splitting Arrays with OPENJSON

Dave Mason continues a journey into parsing JSON with T-SQL:

Starting with SQL Server 2016, Microsoft provided a STRING_SPLIT function. It is a table-valued function that splits a string into rows of substrings, based on a specified separator character. It’s been a welcome addition that we waited a long time for. It has one shortcoming, though: the order of the output rows is not guaranteed to match the order of the substrings in the input string.

Microsoft also provided support for parsing JSON data starting with SQL Server 2016. I discovered the OPENJSON function can be used to split strings, and it can also return the ordinal position of each substring from the original input string.

There are some limitations which you’d expect, namely around requirements for valid JSON.

Comments closed

Parsing JSON with T-SQL

Dave Mason has a primer on JSON parsing using T-SQL:

Microsoft added support for JSON data beginning with SQL Server 2016. JSON is an open-standard file format consisting of attribute–value pairs and array data types. It is commonly used to transmit data objects for asynchronous browser–server communication. But it is also used for storing unstructured data in files or NoSQL databases such as Microsoft Azure Cosmos DB. For most of us, SQL Server’s support for JSON probably means two things: we can convert relational data to JSON and vice versa. In this post, I’ll focus on converting JSON to relational data and share what I’ve learned from a recent experience.

I’ve been pleasantly surprised with the way JSON support works in SQL Server. It’s supported every complicated scenario I’ve had to deal with so far, including nesting, deciding with or without arrays for the outer element, quotes or no quotes around numbers, etc.

Comments closed

Minimal Logging When Inserting into Heaps

Paul White gives us the lowdown on minimal logging when performing INSERT..SELECT operations into heap tables:

When inserting rows using INSERT...SELECT into a heap with no nonclustered indexes, the documentation universally states that such inserts will be minimally logged as long as a TABLOCK hint is present. This is reflected in the summary tables included in the Data Loading Performance Guide and the Tiger Team post. The summary rows for heap tables without indexes are the same in both documents (no changes for SQL Server 2016):

But it’s not quite that straightforward, as Paul shows. Read the whole thing.

Comments closed

Replacing Text Across SQL Agent Jobs

Max Vernon shares a script to perform a find-and-replace across SQL Agent jobs:

Once in a while you might need to make common changes to a lot of SQL Server Agent Jobs. For example, if you change the path where you store SQL Server backup files, you might need to update many jobs to point at \\SERVERB\Backups instead of \\SERVERA\Backups. The script below provides a simple instance-wide find-and-replace for SQL Server Agent job-step commands. It modifies the command text for all jobs that contain the matching @Find parameter, replacing it with the provided @Replace value. You can exclude jobs by adding them to the list of values in the #excludeJobs table.

Click through for the script.

Comments closed

Breaking Up Queries with UNION ALL

Bert Wagner takes us through a scenario where it can be faster to combine queries with UNION ALL rather than using IN:

Even though this query reads the whole clustered index to get the Benefactor rows, the total number of logical reads is still smaller than the seek/key lookup pattern seen in the combined query with IN(). This UNION ALL version gives SQL Server the ability to build a hybrid execution plan, combining two different techniques to generate a plan with fewer overall reads.

Click through for the example.

Comments closed

The Importance of Aliasing in Subqueries

Gail Shaw explains an unexpected result when writing a statement with a subquery:

The column name in the temp table is missing an I, probably just a typo, but it has some rather pronounced effects.

The obvious next question is why the select with the subquery in it didn’t fail, after all, the query asks for ClientID from #TempClients, and there’s no such column. However there is a ClientID column available in that query, and it’s in the Orders table. And that’s a valid column for the subquery, because column binding order, when we have subqueries, is first to tables within the subquery, and then, if no match is found, to tables in the outer query.

I know that the first time I experienced this, I thought it was a bug as well. But no, it’s by design and Gail explains why.

Comments closed

Multi-Pattern Replacement with SQL Server

Hugo Kornelis has a pattern matching problem to solve:

The actual use case and the list of patterns that I had to remove are considered a confidential competitive advantage by my client, so I will just make up a list here to illustrate the problem. In this fake requirement, the following patterns must all be removed if anywhere in the string: “-@@%”, “@@%”, “-@%”, “@%”, “No.X#X”, and “^^^”. Also, “@X” needs to be removed when at the end of the string. In these patterns, @ is a placeholder for a numeric symbol, X for an alphabetic symbol, and # for an alphabetic or numeric symbol. All other symbols are to be taken literally. (So “%” is the percent mark, not the LIKE pattern for any character!).

This is a problem for regular expressions, but without built-in regular expressions (and I’d guess no desire to use the CLR), Hugo gives us a workable solution.

Comments closed

Forensic Accounting: Cohort Analysis

I continue my series on forensic accounting techniques with cohort analysis:

In the last post, we focused on high-level aggregates to gain a basic understanding of our data. We saw some suspicious results but couldn’t say much more than “This looks weird” due to our level of aggregation. In this post, I want to dig into data at a lower level of detail. My working conception is the cohort, a broad-based comparison of data sliced by some business-relevant or analysis-relevant component.

Those familiar with Kimball-style data warehousing already understand where I’m going with this. In the basic analysis, we essentially look at fact data with a little bit of disaggregation, such as looking at data by year. In this analysis, we introduce dimensions (sort of) and slice our data by dimensions.

Click through for some fraud-finding fun.

Comments closed

Basic Forensic Accounting Techniques

I continue my series on forensic accounting techniques:

Growth analysis focuses on changes in ratios over time. For example, you may plot annual revenue, cost, and net margin by year. Doing this gives you an idea of how the company is doing: if costs are flat but revenue increases, you can assume economies of scale or economies of scope are in play and that’s a great thing. If revenue is going up but costs are increasing faster, that’s not good for the company’s long-term outlook.

For our data set, I’m going to use the following SQL query to retrieve bus counts on the first day of each year. To make the problem easier, I add and remove buses on that day, so we don’t need to look at every day or perform complicated analyses.

I get into quite a bit in this post, including a quick tour of multicollinearity, which is only my second-favorite of the three linear regression amigos (heteroskedasticity being my favorite and autocorrelation the hanger-on).

Comments closed