Press "Enter" to skip to content

Category: T-SQL

T-SQL Tips Regarding Subqueries

Itzik Ben-Gan provides quality information on working with subqueries in SQL Server:

In this plan you see a Nested Loops (Left Semi Join) operator, with a scan of the clustered index on Customers as the outer input and a seek in the index on the customerid column in the Orders as the inner input. You also see an outer reference (correlated parameter) based on the custid column in Customers, and the seek predicate Orders.customerid = Customers.custid.

So why are you getting the plan in Figure 1 and not the one in Figure 2? If you haven’t figured it out yet, look closely at the definitions of both tables—specifically the column names—and at the column names used in the query. You will notice that the Customers table holds customer IDs in a column called custid, and that the Orders table holds customer IDs in a column called customerid. However, the code uses custid in both the outer and inner queries. 

Itzik covers three specific scenarios, all of which can cause trouble to database developers who haven’t been burned yet. And sometimes even those who have.

Comments closed

Splitting Arrays with OPENJSON

Dave Mason continues a journey into parsing JSON with T-SQL:

Starting with SQL Server 2016, Microsoft provided a STRING_SPLIT function. It is a table-valued function that splits a string into rows of substrings, based on a specified separator character. It’s been a welcome addition that we waited a long time for. It has one shortcoming, though: the order of the output rows is not guaranteed to match the order of the substrings in the input string.

Microsoft also provided support for parsing JSON data starting with SQL Server 2016. I discovered the OPENJSON function can be used to split strings, and it can also return the ordinal position of each substring from the original input string.

There are some limitations which you’d expect, namely around requirements for valid JSON.

Comments closed

Parsing JSON with T-SQL

Dave Mason has a primer on JSON parsing using T-SQL:

Microsoft added support for JSON data beginning with SQL Server 2016. JSON is an open-standard file format consisting of attribute–value pairs and array data types. It is commonly used to transmit data objects for asynchronous browser–server communication. But it is also used for storing unstructured data in files or NoSQL databases such as Microsoft Azure Cosmos DB. For most of us, SQL Server’s support for JSON probably means two things: we can convert relational data to JSON and vice versa. In this post, I’ll focus on converting JSON to relational data and share what I’ve learned from a recent experience.

I’ve been pleasantly surprised with the way JSON support works in SQL Server. It’s supported every complicated scenario I’ve had to deal with so far, including nesting, deciding with or without arrays for the outer element, quotes or no quotes around numbers, etc.

Comments closed

Minimal Logging When Inserting into Heaps

Paul White gives us the lowdown on minimal logging when performing INSERT..SELECT operations into heap tables:

When inserting rows using INSERT...SELECT into a heap with no nonclustered indexes, the documentation universally states that such inserts will be minimally logged as long as a TABLOCK hint is present. This is reflected in the summary tables included in the Data Loading Performance Guide and the Tiger Team post. The summary rows for heap tables without indexes are the same in both documents (no changes for SQL Server 2016):

But it’s not quite that straightforward, as Paul shows. Read the whole thing.

Comments closed

Replacing Text Across SQL Agent Jobs

Max Vernon shares a script to perform a find-and-replace across SQL Agent jobs:

Once in a while you might need to make common changes to a lot of SQL Server Agent Jobs. For example, if you change the path where you store SQL Server backup files, you might need to update many jobs to point at \\SERVERB\Backups instead of \\SERVERA\Backups. The script below provides a simple instance-wide find-and-replace for SQL Server Agent job-step commands. It modifies the command text for all jobs that contain the matching @Find parameter, replacing it with the provided @Replace value. You can exclude jobs by adding them to the list of values in the #excludeJobs table.

Click through for the script.

Comments closed

Breaking Up Queries with UNION ALL

Bert Wagner takes us through a scenario where it can be faster to combine queries with UNION ALL rather than using IN:

Even though this query reads the whole clustered index to get the Benefactor rows, the total number of logical reads is still smaller than the seek/key lookup pattern seen in the combined query with IN(). This UNION ALL version gives SQL Server the ability to build a hybrid execution plan, combining two different techniques to generate a plan with fewer overall reads.

Click through for the example.

Comments closed

The Importance of Aliasing in Subqueries

Gail Shaw explains an unexpected result when writing a statement with a subquery:

The column name in the temp table is missing an I, probably just a typo, but it has some rather pronounced effects.

The obvious next question is why the select with the subquery in it didn’t fail, after all, the query asks for ClientID from #TempClients, and there’s no such column. However there is a ClientID column available in that query, and it’s in the Orders table. And that’s a valid column for the subquery, because column binding order, when we have subqueries, is first to tables within the subquery, and then, if no match is found, to tables in the outer query.

I know that the first time I experienced this, I thought it was a bug as well. But no, it’s by design and Gail explains why.

Comments closed

Multi-Pattern Replacement with SQL Server

Hugo Kornelis has a pattern matching problem to solve:

The actual use case and the list of patterns that I had to remove are considered a confidential competitive advantage by my client, so I will just make up a list here to illustrate the problem. In this fake requirement, the following patterns must all be removed if anywhere in the string: “-@@%”, “@@%”, “-@%”, “@%”, “No.X#X”, and “^^^”. Also, “@X” needs to be removed when at the end of the string. In these patterns, @ is a placeholder for a numeric symbol, X for an alphabetic symbol, and # for an alphabetic or numeric symbol. All other symbols are to be taken literally. (So “%” is the percent mark, not the LIKE pattern for any character!).

This is a problem for regular expressions, but without built-in regular expressions (and I’d guess no desire to use the CLR), Hugo gives us a workable solution.

Comments closed

Forensic Accounting: Cohort Analysis

I continue my series on forensic accounting techniques with cohort analysis:

In the last post, we focused on high-level aggregates to gain a basic understanding of our data. We saw some suspicious results but couldn’t say much more than “This looks weird” due to our level of aggregation. In this post, I want to dig into data at a lower level of detail. My working conception is the cohort, a broad-based comparison of data sliced by some business-relevant or analysis-relevant component.

Those familiar with Kimball-style data warehousing already understand where I’m going with this. In the basic analysis, we essentially look at fact data with a little bit of disaggregation, such as looking at data by year. In this analysis, we introduce dimensions (sort of) and slice our data by dimensions.

Click through for some fraud-finding fun.

Comments closed