Press "Enter" to skip to content

Category: T-SQL

SparkSQL CONCAT vs T-SQL CONCAT

Bill Fellows has a public service announcement:

The concat function is super handy in the database world but be aware that the SQL Server one is way better because it solves two problems. It combines everything into a string and it does not require NULL checking. In the before times, one had to down cast to a n/var/char type as well as check for NULL before appending strings via the plus sign.

The point of difference is so important that Bill busted out the marquee HTML tag. Which now leads me to wonder, was marquee or blink the bigger evil in the mid-to-late ’90s web?

Comments closed

Unit Testing Dynamic SQL

Jay Robinson lays out a pattern:

Dynamic SQL (aka Ad Hoc SQL) is SQL code that is generated at runtime. It’s quite common. Nearly every system I’ve supported in the past 30 years uses it to some degree, some more than others.

It can also be a particularly nasty pain point in a lot of systems. It can be a security vulnerability. It can be difficult to troubleshoot. It can be difficult to document. And it can produce some wickedly bad results.

Click through for Jay’s process as well as recommendations and an example. It’s certainly worth thinking about.

Comments closed

WITHIN GROUP in STRING_AGG()

Chad Callihan messes with groups:

When was the last time you wrote a SQL query and knew something was possible but just couldn’t remember how? I had one of those moments this week with STRING_AGG and ordering data, and although it was frustrating, I knew it would make a worthwhile blog post. Let’s look at some examples using STRING_AGG and WITHIN GROUP (aka the clause that slipped my mind).

There’s a perfectly good reason why WITHIN GROUP might slip your mind: STRING_AGG() is known as an ordered set function (versus a window function which uses an OVER() clause). It’s also the only ordered set function SQL Server supports, so you don’t get too many opportunities to use the key phrase.

Comments closed

Dueling Sequences for Positive and Negative Numbers

Jose Manuel Jurado Diaz hears banjo music:

SQL Server’s INT data type, by design, provides a range from -2,147,483,648 to 2,147,483,647. But often, developers only utilize the positive range for primary keys and other identifier fields, effectively wasting half of its potential. What if we could harness this full range to temporarily extend the capacity of an INT column? In this article, we explore this idea in-depth.

I’m of two minds here. On the one hand, surrogate keys don’t have a meaning (by definition!), so it doesn’t really matter if that number is positive or negative. Also, including negative numbers makes sense when you expect the steady state table size to be above 2 billion but below 4 billion rows (assuming that you’re using an INT datatype), or you’ve found out that the steady state size is that big after the fact.

On the other hand, I don’t like having a caller define whether they want positive or negative values, as that now imbues meaning to the surrogate key, where positive keys mean one thing and negative keys mean another.

Comments closed

Alternatives to GREATEST and LEAST in SQL Server 2022

Drupal Grupal gives us alternatives:

If you haven’t already heard, SQL 2022 has introduced a new built-in system function called GREATEST. Simply put, it is to a set of columns, variables, expressions etc. what the MAX function is to a set of values (i.e., rows) of a single column or expression. The opposite of GREATEST function is LEAST function which returns the smallest value from the supplied list of columns, variables, constants etc. GREATEST & LEAST can be considered a pair, just as MIN/MAX functions are.

In other situations, GREATEST() and LEAST() are known as ARGMAX() and ARGMIN(), respectively.

Drupal shows us two alternatives to the built-in function and includes a performance comparison.

Comments closed

TINYINT Casts in Spark SQL vs T-SQL

Bill Fellows runs into an interesting oddity:

Yet another thing that has bitten me working in SparkSQL in Databricks—this time it’s data types.

In SQL Server, a tinyint ranges from 0 to 255 but both of them allow for 256 total values. If you attempt to cast a value that doesn’t fit in that range, you’re going to raise an error.

SQL Server’s TINYINT data type is an unsigned one-byte number, whereas TINYINT in Spark SQL is a signed one-byte number. But that’s not the biggest difference Bill finds, so check out the post to learn more.

Comments closed

String Aggregation with STRING_AGG

Jeff Mlakar uses the STRING_AGG() function:

Often when generating dynamic sql we must add some separator values in the string(s) we build. A function delivered way back in 2017 called STRING_AGG is a simple way to do this. Let’s look at some examples.

There are a lot of people who have never heard of the function and don’t know that there’s a better alternative to the combination of STUFF() and FOR XML PATH to concatenate together strings.

Comments closed

Executing Transactions in PostgreSQL

Salman Ahmed rolls it back:

Transactions, like any other database, are a key component of PostgreSQL. A transaction is a sequence of one or more database operations that are executed as a single unit of work. These operations can be queries (e.g. SELECT, INSERT, UPDATE and DELETE) that modify the database’s state.

A transaction’s main purpose is to combine multiple statements into an atomic, all-or-nothing process. It ensures that either all operations within a transaction are fully completed, or none of them are executed at all. Concurrent transactions cannot see each other’s unfinished changes. Updates from ongoing transactions remain hidden until completion, at which point all changes become visible simultaneously.

This is very similar to SQL Server, except their savepoints actually work they way they’re supposed to.

Comments closed

Using IS DISTINCT FROM in SQL Server 2022

Chad Callihan is distinguished:

One feature introduced with SQL Server 2022 that I’ve recently been playing around with is IS [NOT] DISTINCT FROM. This new feature can help when it comes to dealing with NULL value comparisons.

Read on for examples. Do note that x IS NOT DISTINCT FROM y does not provide a performance benefit over its equivalent of x=y OR (x IS NULL AND y IS NULL).

Comments closed

DATEDIFF() and Month Boundaries

Deb Melkin fed the mogwai after midnight:

I was working on a query this week that reminded me of a fun quirk when working with dates and the DATEDIFF function in particular.

I have a process that takes a while to run. Because of all of the moving parts to keep track of, I have an audit table to track what I’m doing to collect basic info like when did it start, when did it end, etc. I created a simple report for myself to break things down so I can report back to the team. I threw together a simple SQL statement, using DATEDIFF to figure out the how long things took. Looking at the results, I saw some interesting results.

Read on for two queries, one which has a bit of a problem and one which strives to correct that problem.

Comments closed