Press "Enter" to skip to content

Category: Syntax

A Thought on Query Granularity

Chris Johnson shares some thoughts:

This query is pretty simple, we’re wanting to return a set of data about the orders placed including the total cost of each order. However the orders in the database are split into an Orders table that holds the order level information, and an OrderLines table that holds information on each product ordered in each order, and we need that OrderLines table to get the cost.

So we write our query, and we join to the OrderLines table, and we sum the cost. But then, because we have an aggregate, we need to group by every other column we’re returning. And to me that just doesn’t look right, and it doesn’t convey the intention of the query properly.

In Chris’s simple example, I’m not sure I’d push it very much, but Chris does have a good point in terms of explaining query intent. Also, depending on how many order lines there are relative to orders (the next step in the chain for that query), aggregation in a common table expression could be faster than waiting until after the join to aggregate on all of the columns. In reality, that’s the most likely reason I’d make this change, assuming that it made a big enough performance difference. But if you take a much more complicated query of this sort, then I’d be more amenable to the argument.

Comments closed

Finding the Max (or Min) Value across Columns

Greg Dodd tries out some new syntax:

Have you ever had 2 columns in a table that you need to select the maximum value from? Not the maximum value from each column, but the biggest value after comparing the columns?

Usually where I have this problem, I’d use a CASE statement, but there’s another way. 

I do like GREATEST() and LEAST() (or argmax/argmin if you’re used to those terms), though Greg does include how you can get the same functionality in versions prior to SQL Server 2022.

Comments closed

Stuffing Characters with STUFF

John McCormack tries out some stuff:

I don’t often go down code based rabbit holes, its just not what I do day to day but a while back, someone asked on twitter’s #sqlhelp hashtag if there was a character length limit to the STUFF function. The documentation didn’t have an answer. For that reason only, I decided to take a look. 

Click through to see what John learned along the way.

Comments closed

ANSI SQL and Trailing Spaces

Chris Johnson finds a language quirk:

Recently I found a quirk of T-SQL, where a group by statement was treating strings as the same if the only difference was one or more trailing spaces. So, ‘aa’ would be grouped with ‘aa ‘. I did some digging, and this is what I found.

Yeah, this isn’t just Microsoft’s T-SQL variant—it’s a standard part of SQL, as Chris notes later in the post.

My “just-so” story is that this might have been implemented to deal with CHAR(x) comparisons, such as CHAR(2) to CHAR(3). There’s no way to make that comparison unless you treat trailing spaces as irrelevant. Because we almost always use VARCHAR(x) or NVARCHAR(x), it isn’t something top of mind to most database practitioners, but there is a method to the madness.

Comments closed

Join Operations in BigQuery

Rathish Kumar joins a few tables together:

SQL joins are used to combine columns from multiple tables to get desired result set. In a typical Relational model we use normalized tables, each table represents an entity (example: employee, department, etc) and its relationships and when we need to get data from more than one tables, for example employee name and employee department, we use joins to combine employee name column from employee table, department name column from department table based on employee number key column, which is available on both the tables.

Similarly, typical data warehouse setup follows Star or Snowflake schema consisting of a primary fact table and satellite dimension tables. Fact tables represents events (example: orders table in a ecommerce business) and dimension table represents attributes and slowly changing information (example: customer, product tables).

The syntax is rather similar to most database engines, though there are a few physical join operators which differ from typical relational database management systems. Also, I’ll take this moment to say thank you to Rathish for not using Venn diagrams to show joins and instead using a proper technique.

Comments closed

Rewriting Scalar UDFs and NULL Results

Erik Darling jumps out of the time machine to warn us, but we have no idea what he’s talking about so we ignore the warning and end up doing the thing he warned us not to do, causing us to need to send him into a time machine to warn us not to do it:

I think I have probably spent 500 hours of my life rewriting T-SQL Scalar UDFs to avoid all the performance problems associated with them.

The obvious choice is the Inline Table Valued Function, which has fewer performance issues baked in. For the kids out there: they don’t spill trauma.

But getting the rewrite right can be tricky, especially when it’s possible for the function to return NULL values.

Click through for the example.

Comments closed

Concatenation per Group in MySQL

Rahul Mehta asks the big questions:

In this article, we are going to discuss how to aggregate the article at a row level. MySQL provides a function called “group_concat” to perform row-level concatenation. Before we go ahead and learn how to do so, let us first understand:

  • Why do we need it?
  • Where do we need it?
  • When to use it?

Fortunately, Rahul then answers these questions; otherwise, it’d be a pretty short article. The T-SQL analog to this is STRING_AGG(), though the syntax and behavior is not precisely the same.

Comments closed

Checking XML Validity

Kevin Wilkie doesn’t like misshapen XML data:

Sometimes you’ll find that you will have XML in your database. This could be for various reasons – from storing the XML after receiving an API response to keeping it in a table because a web developer couldn’t figure out another way to store their data. Sometimes – no matter how much you trust your source – you should question if the XML is well-formed. Let’s work out a few ways you can do that in a database.

Read on for a few tests. The more concerned you are about XML data quality, the more you’d want to push in the direction of having an XSLT defined as well.

Comments closed