Press "Enter" to skip to content

Category: T-SQL

Scalar Function Blocking

Erik Darling notes that scalar functions can cause multi-table blocking:

Someone had tried to be clever. Looking at the code running, if you’ve been practicing SQL Server for a while, usually means one thing.

A Scalar Valued Function was running!

In this case, here’s what it looked like:

Someone had added that function as a computed column to the Users table:

Spoilers:  this was a bad idea.

Comments closed

Recently Added String Functions

Lori Brown covers a few string functions added to SQL Server in the past two versions:

STRING_ESCAPE (https://docs.microsoft.com/en-us/sql/t-sql/functions/string-escape-transact-sql)

This function is available starting with SQL 2016 and is currently only able to escape JSON characters. To me it’s not super useful just yet but hopefully they will add more types soon.

I haven’t had the need to use STRING_ESCAPE yet, but one additional function I’d add is CONCAT_WS.

Comments closed

Finding Overlapping Data Ranges

Louis Davidson shows how to find groups of data which overlap:

This week, I had a problem where I needed to find and eliminate from the results of my query, data with overlapping ranges. I have written about this topic before, in my database design book book, in regards to building a trigger to avoid overlapping ranges. But even though I have written on the topic there, I still use Google just like you to get quick help (I use books when I want to learn, or expand my knowledge on a topic in depth, blogs when I need a simple answer to a simple or complex question.)

The problem at hand is most often associated with date based data, such as effective dates for a row in a temporal/type 2 dimension table, or other cases like appointment times, etc. But the algorithm is the same with numbers and is a bit easier to read since we don’t have the same issues with roundoff and decimal places (the query is complex enough on its own to show in a blog post). From a progression of start and end values in each row, we are going to look at how to check to make sure that there are no two rows that are in conflict (no range should contain another ranges value at all).

This feels like the type of thing which could be rewritten with window function to be a little smoother, but I’d have to think about it more.  Louis does provide a good solution and explanation to a fairly common but tricky problem in T-SQL.

Comments closed

Reverse Engineering The Stream Aggregate Algorithm

Itzik Ben-Gan has started a series of articles on optimizing queries which use grouping and aggregating with a reverse-engineering of the stream aggregate algorithm:

As you may already know, when SQL Server optimizes a query, it evaluates multiple candidate plans, and eventually picks the one with the lowest estimated cost. The estimated plan cost is the sum of all the operators’ estimated costs. In turn, each operator’s estimated cost is the sum of the estimated I/O cost and estimated CPU cost. The cost unit is meaningless in its own right. Its relevance is in the comparison that the optimizer makes between candidate plans. That is, the costing formulas were designed with the goal that, between candidate plans, the one with the lowest cost will (hopefully) represent the one that will finish more quickly. A terribly complex task to do accurately!

The more the costing formulas adequately take into account the factors that truly affect the algorithm’s performance and scaling, the more accurate they are, and the more likely that given accurate cardinality estimates, the optimizer will choose the optimal plan. At any rate, if you want to understand why the optimizer chooses one algorithm versus another you need to understand two main things: one is how the algorithms work and scale, and another is SQL Server’s costing model.

So back to the plan in Figure 1; let’s try and understand how the costs are computed. As a policy, Microsoft will not reveal the internal costing formulas that they use. When I was a kid I was fascinated with taking things apart. Watches, radios, cassette tapes (yes, I’m that old), you name it. I wanted to know how things were made. Similarly, I see value in reverse engineering the formulas since if I manage to predict the cost reasonably accurately, it probably means that I understand the algorithm well. During the process you get to learn a lot.

Our query ingests 1,000,000 rows. Even with this number of rows, the I/O cost seems to be negligible compared to the CPU cost, so it is probably safe to ignore it.

As for the CPU cost, you want to try and figure out which factors affect it and in what way.

I give this my highest recommendation.

Comments closed

Query Tuning With The APPLY Operator

Daniel Janik walks through using the APPLY operator to tune a couple of queries:

Recently we were doing a project that heavily focused on query tuning and many tables had various outer joins. My co-worker pointed out that many of these could be converted to an apply rather than a join.

Apply gives you both CROSS and OUTER. Think of CROSS APPLY like an INNER JOIN and OUTER APPLY like an OUTER JOIN.

Let’s compare some code to see how APPLY stacks up.

I like the APPLY operator so much that I created an entire presentation on it.  It’s not a cure-all by any means, but if you understand the intent, you can find places where it improves your code significantly.

Comments closed

Bucketing Tables By Size

Bill Fellows has an interesting approach to bucketing tables into groups of similar size:

You need to do something to all of the tables in SQL Server. That something can be anything: reindex/reorg, export the data, perform some other maintenance—it really doesn’t matter. What does matter is that you’d like to get it done sooner rather than later. If time is no consideration, then you’d likely just do one table at a time until you’ve done them all. Sometimes, a maximum degree of parallelization of one is less than ideal. You’re paying for more than one processor core, you might as well use it. The devil in splitting a workload out can be ensuring the tasks are well balanced. When I’m staging data in SSIS, I often use a row count as an approximation for a time cost. It’s not perfect – a million row table 430 columns wide might actually take longer than the 250 million row key-value table.

Click through for the script.  For the R version, this Stack Overflow post shows how to do it with cumulative sums and the cut function.

Comments closed

Searching For Valid Characters For T-SQL Regular Identifiers

Solomon Rutzky is on a mission to find a national treasure:

Quite often these types of things are not that easy. Yes, it is very tempting to assume that the limited test is good enough and that we did indeed find the exact list of characters (plus we would need to add in the four extra characters: at sign (@), dollar sign ($), number sign (#), and underscore (_)). However, based on my experiences, it seems that more often than not, doing an exhaustive test results in a slightly different answer that invalidates the previous conclusion (which was based on the limited test). So, while it does take more time to do more extensive testing, it seems like we have little choice if we truly want to know how these things actually work.

What that means is, at the very least, we need to get the complete list of characters accepted by SQL Server for non-delimited identifiers to make sure that the totals match the number of code points returned by the searches done in Step 1.

This post is an interesting dive into the oddities of Unicode, but leaves us on a cliffhanger.  Also, full-crazy Nicolas Cage beats mullet-wearing Tom Hanks any day.

1 Comment

Will It Bit?

Louis Davidson wants to see what he can cast to a bit type:

There are no other textual/alpha string values that will cast to a bit value, but the numeric values that will cast to a bit are voluminous (even some that are in string format). Consider the following eight statements:

SELECT CAST(100 AS bit);
SELECT CAST(-100 AS bit);
SELECT CAST(99999999999999999999999999999999999999 AS bit);
SELECT CAST(-99999999999999999999999999999999999999 AS bit);
SELECT CAST(88.999999 AS bit);
SELECT CAST('1' AS bit);
SELECT CAST('2' AS bit);
SELECT CAST('999999' AS bit);

Danged if they didn’t all work, and all return 1.

Check out what else Louis tries to cast to a bit type.

Comments closed

Using CONCAT_WS

Dave Mason points out another nice addition to the T-SQL toolbelt in SQL Server 2017:

In the last post, I looked at a new T-SQL function for SQL Server 2017. Let’s continue down that path and look at CONCAT_WS(), which is also new for SQL Server 2017. Here’s the definition of the function from Microsoft Docs:

“Concatenates a variable number of arguments with a delimiter specified in the 1st argument. (CONCAT_WS indicates concatenate with separator.)”

Read on for an example using CONCAT_WS.  It’s one of those functions that I haven’t quite committed to memory, but every time I get reminded of it, I remember that I really need to remember it.

Comments closed