T-SQL – Page 67 – Curated SQL

We see right away that this method failed horribly as all of the data was placed into the same dataset. This holds true no matter how many times we execute the code, and it happens because the RAND() function is only evaluated once for the whole query, and not individually for each row. To correct this we’ll instead use a method that Jeff Moden taught me at a SQL Saturday in Detroit several years ago – generating a NEWID() for each row, using the CHECKSUM() function to turn it into a random number, and then the % (modulus) function to turn it into a number between 0 and 99 inclusive.

I’d have to test it out, but I’d think you could modify method 3 to include a CROSS APPLY to perform one ABS(CHECKSUM(NEWID()) and get exact counts that way without a temp table.

Comments closed

The Limits of LEN (or REPLICATE)

Published 2020-07-15 by Kevin Feasel

Pamela Mooney takes us through a quandry:

I was using LEN() to troubleshoot an issue I was having with a dynamically constructed string truncating while inserting into an NVARCHAR(MAX) column. Since I know that NVARCHAR(MAX) has a 2 GB limit (goodness only knows how many characters that is!), I couldn’t explain the truncation. A colleague suggested doing a test with another dynamically constructed string. Maybe then, I could find where the cutoff was occurring.
Great idea!
So, I came up with a plan.

Click through for the plan, but be sure to read Pamela’s comment at the bottom as there’s a bit more to the story.

Comments closed

Creating Evenly-Sized Batches from Groups

Published 2020-07-13 by Kevin Feasel

Daniel Hutmacher has a variant on the islands problem as well as the bin-packing problem:

My aim with this post is to split the dataset into batches of roughly 100 rows each.
DECLARE @target_rowcount bigint=100;
I say “roughly”, because we’re not allowed to split a transaction so that a group (grouping_column_1, grouping_column_2) appears in more than one batch, although a batch can obviously contain more than one group. This means that by necessity, some of the batches are going to be slightly under 100 rows and some are going to be slightly over.

Read on for a good solution to the problem. Daniel mentions places where performance could be better, though this feels like the kind of task you don’t necessarily run all that frequently.

Comments closed

Areas of Improvement for DROP TABLE

Published 2020-07-13 by Kevin Feasel

Michael J. Swart points out a few foibles about the DROP TABLE syntax:

I was looking at the docs for DROP TABLE and I noticed this in the syntax: [ ,...n ]. I never realized that you can drop more than one table in a statement.
I think that’s great. When dropping tables one at a time. You always had to be careful about order when foreign keys were involved. Alas, you still have to care about order.

That is a shame. Michael also includes a few other places where DROP TABLE could be made better, so check it out.

Comments closed

Modifying Data in JSON and XML with SQL Server

Published 2020-07-10 by Kevin Feasel

Steve Stedman continues a series on working with JSON and XML data:

As part of the XML and JSON month of blog posts, this post is going to cover how to make changes to XML and JSON in TSQL.We will be using the same sample data from the original post with a data structure holding employee data.

Click through for the examples.

Comments closed

Iterating over JSON and XML Data in SQL Server

Published 2020-07-09 by Kevin Feasel

Steve Stedman explains how you can iterate through XML and JSON data using the APPLY operator:

The results are what we are looking for in this specific example, but where they break down is when there are more employees represented in the XML, for each employee we need to add another UNION to bring the results together. That is not very iterative and since the title of this post includes the word iterating, we need to focus on how to do that.
Now we introduce the CROSS APPLY functionality that can be used like a JOIN to take a value from one result set (table) and apply it to a function that gets called once for each row. You can reference my JOIN TYPES poster for using CROSS APPLY.

Click through for the full set of examples.

Comments closed

Derived Table Nesting and Performance

Published 2020-07-08 by Kevin Feasel

Itzik Ben-Gan digs into some of the performance considerations around nested derived tables:

Unnesting/substitution of table expressions is a process of taking a query that involves nesting of table expressions, and as if substituting it with a query where the nested logic is eliminated. I should stress that in practice, there’s no actual process in which SQL Server converts the original query string with the nested logic to a new query string without the nesting. What actually happens is that the query parsing process produces an initial tree of logical operators closely reflecting the original query. Then, SQL Server applies transformations to this query tree, eliminating some of the unnecessary steps, collapsing multiple steps into fewer steps, and moving operators around. In its transformations, as long as certain conditions are met, SQL Server can shift things around across what were originally table expression boundaries—sometimes effectively as if eliminating the nested units. All of this in attempt to find an optimal plan.
In this article I cover both cases where such unnesting takes place, as well as unnesting inhibitors. That is, when you use certain query elements it prevent SQL Server from being able to move logical operators in the query tree, forcing it to process the operators based on the boundaries of the table expressions used in the original query.

That’s on my list for a second reading.

Comments closed

Ambiguous Columns in Queries when Using One Table

Published 2020-07-06 by Kevin Feasel

Dave Bland shows how easy it is to get the “Ambiguous column name” error message when querying from a single table:

When I added the “*”, this is where I received unexpected results. All I did was add the “*”. Looking at the code below, you can see SQL Server is having issues with the Name column in the ORDER BY.

I do wish SQL had a symbol representing “everything else,” where the engine of choice would include all columns except those explicitly named. I know there’d be trickiness around things like “LTRIM(ColumnA) AS TrimmedColumnA” but that’d be for the language designers to figure out…

Comments closed

Using VALUES for Multi-Record Operations

Published 2020-07-06 by Kevin Feasel

Daniel Hutmacher explains some of what you can do with the VALUES clause:

Note the commas at the end of each line, denoting that a new row begins here. Because this runs as a single statement, the INSERT runs as an atomic operation, meaning that all rows are inserted, or none at all (like if there’s a syntax issue or a constraint violation).
I use this construct all the time to generate scripts to import data from various external sources, like Excel, or even a result set in Management Studio or Azure Data Studio.

Daniel also has a new app for us to try out.

Comments closed

PRINTing More Than 8000 Bytes

Published 2020-07-01 by Kevin Feasel

Richard Swinbank hits on a bugbear of mine:

A feature of T-SQL is that strings longer than 8000 bytes are truncated by PRINT. If you haven’t already discovered this, you might wonder why it’s a problem – the answer (for me at least) is dynamic SQL.

Read on for Richard’s answer. There is an easier way to do it with the paid version of SQL# by using the Util_Print function.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Category: T-SQL

Splitting Data with T-SQL

The Limits of LEN (or REPLICATE)

Creating Evenly-Sized Batches from Groups

Areas of Improvement for DROP TABLE

Modifying Data in JSON and XML with SQL Server

Iterating over JSON and XML Data in SQL Server

Derived Table Nesting and Performance

Ambiguous Columns in Queries when Using One Table

Using VALUES for Multi-Record Operations

PRINTing More Than 8000 Bytes