Syntax – Page 24 – Curated SQL

Performance Comparing DISTINCT to GROUP BY

Published 2023-01-20 by Kevin Feasel

Reitse Eskens does a performance comparison:

A few days ago, I heard someone stating that Group By was much quicker than Distinct. Less disk impact, less memory etc.
So, I thought I’d find out if it’s true or not because I found it interesting. I always thought there was no difference. I tested a single small table and found no difference in speed, reads or execution plan. But that’s no real world example. Usually the tables contain a lot of data and are joined to other tables.

Click through for the results of Reitse’s analysis.

Comments closed

Time Series Features in SQL Server 2022

Published 2023-01-18 by Kevin Feasel

Kendal Van Dyke walks us through a few new bits of T-SQL in SQL Server 2022:

Time series data is often used for historical comparisons, anomaly detection and alerting, predictive analysis, and reporting, where time is a meaningful axis for viewing or analyzing data.

Time series capabilities in SQL Server were introduced in Azure SQL Edge, Microsoft’s version of SQL Server for the Internet of Things (IoT) which combines capabilities such as data streaming and time series with built-in machine learning and graph features.

I am happy to see that these operators and functions made the leap from Azure SQL Edge and am hopeful that we’ll see a bit more of what makes databases like influxdb so useful for time series make their way in as well.

Comments closed

Deleting Data from Snowflake Tables

Published 2023-01-18 by Kevin Feasel

Kevin Wilkie does some winter cleaning:

Sometimes, in Snowflake as well as in SQL Server, you’re forced to delete data. Hopefully, you’re not deleting from the main table of a database in Production, but that is definitely one way to wake up in the morning.

The syntax starts off as similar to T-SQL but there are a couple wrinkles in there as well.

Comments closed

Using the Native Pipe in R 4.1+

Published 2023-01-13 by Kevin Feasel

Michael Mayer shows off the native R pipe:

What does the pipe do? It puts the object on its left as the first argument into the function on its right: iris %>% head() is a funny way of writing head(iris). It helps to avoid long function chains like f(g(h(x))), or repeated assignments.

In 2021 and version 4.1, R has received its native forward pipe operator |> so that we can write nice code like this:

Tying pipe syntax all back together, the magrittr pipe %>% was (as I recall) built with the F# pipe |> in mind. In R 4.1 and later, the built-in pipe is |>, as is right and natural in this world. Regardless, do check the comment before trying out this code, as it appear to work for R 4.2 and later, though not 4.1.

Comments closed

INSERT OVERWRITE in Snowflake

Published 2023-01-13 by Kevin Feasel

Kevin Wilkie shows off an interesting feature in Snowflake:

One of the things that I’ve gotten used to with Snowflake is that it’s just different. Well, today, I’ve got some goodness for y’all – and in a place that you really wouldn’t expect – the INSERT statement.

For most of your everyday use, it works exactly as you’d expect. As some would say, it does what it says on the tin – INSERT data.

Comments closed

Finding Skipped Identity Values in a Table

Published 2023-01-09 by Kevin Feasel

Brent Ozar minds the gap:

When someone says, “Find all the rows that have been deleted,” it’s a lot easier when the table has an Id/Identity column. Let’s take the Stack Overflow Users table:

It has Ids -1, 1, 2, 3, 4, 5 … but no 6 or 7. (Or 0.) If someone asks you to find all the Ids that got deleted or skipped, how do we do it?

Click through for two methods, one specific to SQL Server 2022 and one which works for all versions of SQL Server.

Comments closed

Common Table Expressions in MySQL

Published 2023-01-06 by Kevin Feasel

Robert Sheldon looks at the syntax for common table expressions in MySQL:

As with many relational database management systems, MySQL provides a variety of methods for combining data in a data manipulation language (DML) statement. You can join multiple tables in a single query or add subqueries that pull data in from other tables. You can also access views and temporary tables from within a statement, often along with permanent tables.

MySQL also offers another valuable tool for working with data—the common table expression (CTE). A CTE is a named result set that you define in a WITH clause. The WITH clause is associated with a single DML statement but is created outside the statement. However, only that statement can access the result set.

The syntax is very similar to that of SQL Server save for an explicit RECURSIVE clause rather implicit recursion as in T-SQL.

Comments closed

Join Types in Spark SQL

Published 2022-12-29 by Kevin Feasel

Rituraj Khare makes some connections:

In Apache Spark, we can use the following types of joins in SQL:

Inner join: An inner join in Apache Spark is a type of join that returns only the rows that match a given predicate in both tables. To perform an inner join in Spark using Scala, we can use the join method on a DataFrame.

The set of options is the same as you’d see in a relational database: inner, left outer, right outer, full outer, and cross. The examples here are in Scala, though would apply just as easily to PySpark and, of course, writing classic SQL statements.

Comments closed

Comparing Table Records with T-SQL

Published 2022-12-22 by Kevin Feasel

Chad Callihan compares and contrasts:

We recently looked at looked at comparing schemas using Azure Data Studio. What if we need to compare tables by using a query? For this post we’ll compare using EXCEPT, NOT IN, and NOT EXISTS to find differences between two tables.

Our two tables to compare will be Comic and Comic_Copy. Based on counts, we have 48 more records in Comic than we do in Comic_Copy. Let’s find the differences.

In Chad’s specific query, NOT EXISTS works great. Where I like EXCEPT is when you need to see if any of the non-key columns differ. For example, if you also needed to compare titles for rows with the same ID and ensure those titles matched.

Comments closed

The Value (and Cost) of DATETRUNC

Published 2022-12-21 by Kevin Feasel

Brent Ozar points out the ups and downs of DATETRUNC():

The first one, passing in a specific start & end date, gets the best plan, runs the most quickly, and does the least logical reads (4,299.) It’s a winner by every possible measure except ease of writing the query. When SQL Server is handed a specific start date, it can seek to that specific part of the index, and read only the rows that matched.

DATETRUNC and YEAR both produce much less efficient plans. They scan the entire index (19,918 pages), reading every single row in the table, and run the function against every row, burning more CPU.

SQL Server’s thought process is, and has always been, “I have no idea what’s the first date that would produce YEAR(2017). There’s just no way I could possibly guess that. I might as well read every date since the dawn of time.”

Read on for the upshot.

Comments closed

Category: Syntax