Press "Enter" to skip to content

Category: Syntax

External Tables and the Serverless SQL Pool

Ryan Adams continues a series on querying the serverless SQL pool in Azure Synapse Analytics:

There are two ways to read data inside Data Lake using the Synapse Serverless engine.  In this article, we’ll look at the second method which uses an external table to query a path within the lake.

Synapse is a collection of tools with four different analytical engines (Dedicated PoolSpark PoolServerless PoolData Explorer Pool).  This gives you a lot of options for ingesting, transforming, storing, and querying your data.  Here you will use the Synapse Serverless Pool to query the data in your ADLS account.   

Read on for a demonstration.

Comments closed

Emulating Window Functions in MySQL 5.7

Lukas Eder says, we have window functions at home:

One of MySQL 8’s biggest improvements is the support of window functions. As I always said in conferences, there’s SQL before window functions and SQL after window functions. Once you start using them, you’ll use them everywhere.

Some of you poor souls are unfortunate enough to be stuck on MySQL 5.7, either of your own choosing, or because you’re using a clone / fork that is still 5.7 compatible. While for most people, this blog post is just for your amusement, or nostalgia, for some of you this post will be quite useful.

If you are in a windowless world, read on to see how you can make life a little more manageable.

Comments closed

Performance Comparing DISTINCT to GROUP BY

Reitse Eskens does a performance comparison:

A few days ago, I heard someone stating that Group By was much quicker than Distinct. Less disk impact, less memory etc.
So, I thought I’d find out if it’s true or not because I found it interesting. I always thought there was no difference. I tested a single small table and found no difference in speed, reads or execution plan. But that’s no real world example. Usually the tables contain a lot of data and are joined to other tables.

Click through for the results of Reitse’s analysis.

Comments closed

Time Series Features in SQL Server 2022

Kendal Van Dyke walks us through a few new bits of T-SQL in SQL Server 2022:

Time series data is often used for historical comparisons, anomaly detection and alerting, predictive analysis, and reporting, where time is a meaningful axis for viewing or analyzing data.

Time series capabilities in SQL Server were introduced in Azure SQL Edge, Microsoft’s version of SQL Server for the Internet of Things (IoT) which combines capabilities such as data streaming and time series with built-in machine learning and graph features.

I am happy to see that these operators and functions made the leap from Azure SQL Edge and am hopeful that we’ll see a bit more of what makes databases like influxdb so useful for time series make their way in as well.

Comments closed

Using the Native Pipe in R 4.1+

Michael Mayer shows off the native R pipe:

What does the pipe do? It puts the object on its left as the first argument into the function on its right: iris %>% head() is a funny way of writing head(iris). It helps to avoid long function chains like f(g(h(x))), or repeated assignments.

In 2021 and version 4.1, R has received its native forward pipe operator |> so that we can write nice code like this:

Tying pipe syntax all back together, the magrittr pipe %>% was (as I recall) built with the F# pipe |> in mind. In R 4.1 and later, the built-in pipe is |>, as is right and natural in this world. Regardless, do check the comment before trying out this code, as it appear to work for R 4.2 and later, though not 4.1.

Comments closed

Finding Skipped Identity Values in a Table

Brent Ozar minds the gap:

When someone says, “Find all the rows that have been deleted,” it’s a lot easier when the table has an Id/Identity column. Let’s take the Stack Overflow Users table:

It has Ids -1, 1, 2, 3, 4, 5 … but no 6 or 7. (Or 0.) If someone asks you to find all the Ids that got deleted or skipped, how do we do it?

Click through for two methods, one specific to SQL Server 2022 and one which works for all versions of SQL Server.

Comments closed

Common Table Expressions in MySQL

Robert Sheldon looks at the syntax for common table expressions in MySQL:

As with many relational database management systems, MySQL provides a variety of methods for combining data in a data manipulation language (DML) statement. You can join multiple tables in a single query or add subqueries that pull data in from other tables. You can also access views and temporary tables from within a statement, often along with permanent tables.

MySQL also offers another valuable tool for working with data—the common table expression (CTE). A CTE is a named result set that you define in a WITH clause. The WITH clause is associated with a single DML statement but is created outside the statement. However, only that statement can access the result set.

The syntax is very similar to that of SQL Server save for an explicit RECURSIVE clause rather implicit recursion as in T-SQL.

Comments closed

Join Types in Spark SQL

Rituraj Khare makes some connections:

In Apache Spark, we can use the following types of joins in SQL:

Inner join: An inner join in Apache Spark is a type of join that returns only the rows that match a given predicate in both tables. To perform an inner join in Spark using Scala, we can use the join method on a DataFrame.

The set of options is the same as you’d see in a relational database: inner, left outer, right outer, full outer, and cross. The examples here are in Scala, though would apply just as easily to PySpark and, of course, writing classic SQL statements.

Comments closed