Press "Enter" to skip to content

Category: Syntax

GROUP BY Clause Order and Performance in PostgreSQL

Andrei Lepikhov runs some tests:

PostgreSQL users often employ analytical queries that sort and group data by different rules. Optimising these operators can significantly reduce the time and cost of query execution. In this post, I will discuss one such optimisation: choosing the order of columns in the GROUP BY expression.

Postgres can already reshuffle the list of grouped expressions according to the ORDER BY condition to eliminate additional sorting and save computing resources. We went further and implemented an additional strategy of group-by-clause list permutation in a series of patches (the first attempt and the second one) for discussion with the Postgres community, expecting it to be included in the next version of PostgreSQL core. You can also try it in action in the commercial Postgres Pro Enterprise fork.

From a “this is a 4th generation language” perspective, there should never be a performance difference in the ordering of grouped columns in a GROUP BY clause. The database optimizer should always determine the optimal ordering. Read Andrei’s article to see how much of a challenge that can be in practice.

Comments closed

New Syntax in SQL Server 2022

Steve Jones talks syntax:

At the recent 2024 PASS Data Community Summit, I saw a lightning talk from Mikey Bronowski on the New T-SQL Functions in SQL Server 2022.  Before the talk, I made a joke with him that none of these were new because it’s 2024. They’d been out for 2 years.

Mikey did a nice job, given that he only had 10 minutes, but it was enough to give attendees an idea of some new things they might investigate to use in their own code. If you missed the talk, or you don’t have access to the recordings, we have a series on SQL Server Central that covers these (Part 1Part 2Part 3) and helps you understand the new options. The list of language changes is also in the MS Docs under What’s New in the Language.

Steve asks about how much utilization there is of these. I think the problem is that so few organizations have adopted SQL Server 2022 that knowledge that these things even exist is quite low.

By the way, I do have a script covering many of these new bits of syntax as part of a talk on getting up to speed with core T-SQL changes since SQL Server 2012.

Comments closed

Creating Totals and Subtotals in Postgres with ROLLUP and CUBE

Elizabeth Christensen uses a pair of analytical operators:

Postgres is being used more and more for analytical workloads. There’s a few hidden gems I recently ran across that are really handy for doing SQL for data analysis, ROLLUP and CUBE. Rollup and cube don’t get a lot of attention, but follow along with me in this post to see how they can save you a few steps and enhance your date binning and summary reporting.

I’ve used ROLLUP on occasion, but never found a great case when CUBE made sense in a report. I am, however, quite partial to GROUPING SETS, the third of these analytical operators and the one that gives you the most control.

Comments closed

The IS Operator in T-SQL

Eric Blinn wants to know of those NULLs:

Many data professionals search for help writing T-SQL queries containing columns with NULL values in a Microsoft SQL Server table. Some of the most common NULL functions include IS NULL, IS NOT NULL, NOT NULL, NULL, what is NULL, NULL vs NOT NULL, etc. In this SQL tutorial, the goal is to help you better understand how to work with NULL values in a SQL database.

Even the best of us have accidentally used = NULL in a script when we meant IS NULL. Which is totally different from ISNULL(), of course. And NULLIF(), naturally.

Comments closed

Materialized Views in PostgreSQL

Brent Ozar builds a view but a special one:

That query gets the top 100 users who have the most accepted answers. On my server, that takes about a minute to run – unacceptably long for a public-facing web site, for example.

Does that data need to be up to the second? Of course not. The leaderboard of the top 100 answerers isn’t going to change all that quickly. People who have successfully answered hundreds of questions aren’t going to suddenly disappear, nor is someone else suddenly going to rocket to the top.

For report queries like this, Postgres offers materialized views: a view that’s written to disk, and then updated only when you want to update it. 

Read on to see how you can create one in PostgreSQL. Brent does touch on one of the differences between indexed views in SQL Server and materialized views in PostgreSQL while covering the process of creating, querying, and updating materialized views. In discussing how to update them, Brent covers en passant a second difference between indexed views in SQL Server and materialized views in PostgreSQL. Whether the “keep it up to date at all times” approach beats the “update it when you want but let data go stale in the meantime” approach is better, that’s something worth debating.

Comments closed

Moving Averages in T-SQL

Jared Westover does the math:

Even though I enjoy using SQL Server, there are some things other tools do better. For example, calculating moving averages or rolling totals is often simpler in tools like Power BI or Excel. That’s because Microsoft built those programs with that functionality in mind. Recently, we had to optimize a complex moving average query written for SQL Server 2008R2. Surprise! There’s no built-in function for moving averages in SQL Server. But don’t worry; I’ll show you how to make it work.

Read on for the solution, as well as the mess we had to work with prior to SQL Server 2012.

Comments closed

The Concepts of Data Control Language in SQL

Joe Celko talks about the lesser-known language in SQL (compared to DML and DDL):

But the truth is that the most important sub-language is the one that needs fixing. You wonder why a three-legged stool works? All three legs have to be coordinated together; the same principle holds an SQL schema.

The third sub language in SQL is the DCL (data control language). This is where you get those database privileges I just mentioned. SQL classes don’t spend a lot of time on DCL for several reasons. The first of all is that you’re a mere USER and you’re probably not allowed to pass out privileges. The original ANSI/ISO security model was pretty simple. The universe was divided into USER and USER. An important concept in that you do not create a privilege, but the ADMIN grants it to a user and it is separate from the DDL.

Click through to learn more.

Comments closed

Function Volatility in PL/pgSQL

Deepak Mahto shares some advice:

The PL/pgSQL language, available as a default extension in PostgreSQL, provides powerful tools and flexibility for application developers to build complex, enterprise-scale functionality within the database. Through PL/pgSQL’s functions and procedures, developers can choose different volatility categories—IMMUTABLE, STABLE, or VOLATILE—that offer varying performance benefits, especially in terms of result caching and data state awareness. For a deeper dive, refer to the official documentation. Choosing the appropriate volatility for a function is crucial; as they say, “With great power comes great responsibility.

Read on to learn more. It seems like it would be quite easy to mess this up.

Comments closed

The NOT IN Operator in R

Steven Sanderson does not want these things:

In R programming, data filtering and manipulation are needed skills for any developer. One of the most useful operations you’ll frequently encounter is checking whether elements are NOT present in a given set. While R doesn’t have a built-in “NOT IN” operator like SQL, we can easily create and use this functionality. This comprehensive guide will show you how to implement and use the “NOT IN” operator effectively in R.

Read on for examples of how to use %in% and its corollary ! (...) %in%.

Comments closed

How Postgres Parses Conditional Expressions in PL/pgSQL

Deepak Mahto parses a command:

At first glance, this code block seems incomplete. Notice the IF condition: it appears to be missing an additional condition after the AND operator. Logically, this should cause an exception due to the incomplete condition following AND.

  IF i = 0 AND THEN 

However, during PL/pgSQL execution, the condition is evaluated without any syntax errors or warnings. This raises a critical question:

How does PostgreSQL internally process this condition?
What allows this seemingly incomplete expression to work?

Read on for those answers.

Comments closed