Press "Enter" to skip to content

Category: Syntax

SOUNDEX in Snowflake

Kevin Wilkie makes a noise:

In Snowflake, there is another function that is SOUNDEX-like that can give slightly different results – SOUNDEX_P123.

For those cases where the first and second letters of the string have the same SOUNDEX number, the SOUNDEX_P123 function will keep the number for the second letter. This variant is used in a few other database systems, for example, Teradata.

Click through for a demonstration of the two SOUNDEX() variants and how results can differ.

Comments closed

Sessions and Execution of Dynamic SQL

Deborah Melkin riddles us this on dynamic SQL:

I admit it – I do waaayyyy too much with dynamic SQL. But I just keep running into situations that require it. That being said, I ran into an interesting problem that had me puzzled. I found a bunch of different blog posts that pointed to me to the right direction but required a little extra work to find the solution.

There are several concepts that are at play here, so I’ll try to break this out so we can put the pieces together. The first once is centered around dynamic SQL. There are two parts of this I want to make sure we understand first – how it fits into sessions and how it gets executed.

Read the whole thing.

Comments closed

Max and Min Functions in KQL

Robert Cain goes extreme:

The max and min aggregation functions are common to almost every language, and the Kusto Query Language is no exception. As you would think, when you pipe in a dataset max returns the maximum value for the column name you pass in. Likewise min returns the lowest value.

In addition, there are variants for each, maxif and minif. We’ll see examples for all of these in this post.

Click through for a few functions you can call via the summarize operator.

Comments closed

External Tables and the Serverless SQL Pool

Ryan Adams continues a series on querying the serverless SQL pool in Azure Synapse Analytics:

There are two ways to read data inside Data Lake using the Synapse Serverless engine.  In this article, we’ll look at the second method which uses an external table to query a path within the lake.

Synapse is a collection of tools with four different analytical engines (Dedicated PoolSpark PoolServerless PoolData Explorer Pool).  This gives you a lot of options for ingesting, transforming, storing, and querying your data.  Here you will use the Synapse Serverless Pool to query the data in your ADLS account.   

Read on for a demonstration.

Comments closed

Emulating Window Functions in MySQL 5.7

Lukas Eder says, we have window functions at home:

One of MySQL 8’s biggest improvements is the support of window functions. As I always said in conferences, there’s SQL before window functions and SQL after window functions. Once you start using them, you’ll use them everywhere.

Some of you poor souls are unfortunate enough to be stuck on MySQL 5.7, either of your own choosing, or because you’re using a clone / fork that is still 5.7 compatible. While for most people, this blog post is just for your amusement, or nostalgia, for some of you this post will be quite useful.

If you are in a windowless world, read on to see how you can make life a little more manageable.

Comments closed

Performance Comparing DISTINCT to GROUP BY

Reitse Eskens does a performance comparison:

A few days ago, I heard someone stating that Group By was much quicker than Distinct. Less disk impact, less memory etc.
So, I thought I’d find out if it’s true or not because I found it interesting. I always thought there was no difference. I tested a single small table and found no difference in speed, reads or execution plan. But that’s no real world example. Usually the tables contain a lot of data and are joined to other tables.

Click through for the results of Reitse’s analysis.

Comments closed

Time Series Features in SQL Server 2022

Kendal Van Dyke walks us through a few new bits of T-SQL in SQL Server 2022:

Time series data is often used for historical comparisons, anomaly detection and alerting, predictive analysis, and reporting, where time is a meaningful axis for viewing or analyzing data.

Time series capabilities in SQL Server were introduced in Azure SQL Edge, Microsoft’s version of SQL Server for the Internet of Things (IoT) which combines capabilities such as data streaming and time series with built-in machine learning and graph features.

I am happy to see that these operators and functions made the leap from Azure SQL Edge and am hopeful that we’ll see a bit more of what makes databases like influxdb so useful for time series make their way in as well.

Comments closed

Using the Native Pipe in R 4.1+

Michael Mayer shows off the native R pipe:

What does the pipe do? It puts the object on its left as the first argument into the function on its right: iris %>% head() is a funny way of writing head(iris). It helps to avoid long function chains like f(g(h(x))), or repeated assignments.

In 2021 and version 4.1, R has received its native forward pipe operator |> so that we can write nice code like this:

Tying pipe syntax all back together, the magrittr pipe %>% was (as I recall) built with the F# pipe |> in mind. In R 4.1 and later, the built-in pipe is |>, as is right and natural in this world. Regardless, do check the comment before trying out this code, as it appear to work for R 4.2 and later, though not 4.1.

Comments closed