Press "Enter" to skip to content

Category: Syntax

Concatenating in SQL Server

Lee Markup takes us through a pair of very useful functions in SQL Server:

SQL Server concatenation methods have been enhanced in modern versions of SQL Server. SQL Server 2012 introduced the CONCAT() function. In SQL Server 2017 we get CONCAT_WS().

A common usage of concatenation, or joining column values together in a string, is combining a FirstName and LastName column into a FullName column.  Another common usage might be for creating an address column that pulls together building number, street, city and zip code.

Read on to learn more. CONCAT() and CONCAT_WS() are also extremely helpful for change detection in ETL processes. For example, you might have a queue table to process and only want to update records in which relevant source fields changed, ignoring the ones which don’t exist in your destination. A combination of HASHBYTES() and CONCAT_WS() will do the trick quite nicely.

Comments closed

Order, Sort, Cluster, and Distribute in Hive

The Hadoop in Real World team give us three methods (and one synonym) to organize results in Hive:

Hive provides 3 options to order or sort the result of records – order by, sort by, cluster by and distribute by. Which option you choose has performance implications. So it is important to understand the difference between the options and choose the right one for the use case at hand.

Click through for a high-level overview of the techniques.

Comments closed

Avoiding WHILE 1=1 Loops

Aaron Bertrand does not believe in the power of the infinite loop:

A short time ago a colleague had an issue with a Microsoft SQL Server stored procedure. They were using our recommended approach for batching updates, but there was a small problem with their code that led to the procedure “running forever.” I think we’ve all made a mistake like this at one point or another; here’s how I try to avoid the situation altogether.

The argument isn’t “don’t use WHILE loops” or “don’t use batching logic,” but instead to ensure that you have a break condition somewhere. It’s reasonable to ask for an end state before you begin processing something, after all.

Comments closed

LATERAL VIEW in Hive

The Hadoop in Real World team provides a quick example of a powerful feature in Apache Hive:

Lateral view is used in conjunction with user-defined table generating functions such as explode(). A UDTF generates zero or more output rows for each input row. 

Click here if you like to know the difference between UDF, UDAF and UDTF

A lateral view first applies the UDTF to each row of the base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.

In other words, LATERAL joins are the SQL standard for Microsoft’s CROSS APPLY operator. I normally dislike having different names for the same thing due to the risk of confusion, but in fairness to Microsoft on this one, my recollection is that the common name came after SQL Server 2005, which already had CROSS APPLY and OUTER APPLY.

Comments closed

Getting Distinct Values before STRING_AGG

Greg Dodd shows how to remove duplicate values from a list before passing them to the STRING_AGG() function:

SQL introduced the new STRING_AGG feature in SQL 2017, and it works just like it suggests it would: it’s an aggregate function that takes all of the string values and joins them together with a separator. To see how it works, I’m using the StackOverflow users table, and let’s say we want to create a list of Display Names and we’re going to group it based on Location:

Click through for two methods, one of which is considerably better than the other.

Comments closed

Ways to Filter Data in PostgreSQL

Gauri Mahajan shows off several techniques for filtering data in PostgreSQL:

Data is hosted in a variety of data repositories, one of which is relational databases. Out of tens of commercial and open-source relational databases, one of the most popular open-source relational databases is PostgreSQL. This database is offered on the Azure cloud platform through a service named Azure Database for PostgreSQL. One of the most fundamental operations performed on the database is reading and writing data to consume and host data. It goes without saying that when the data is consumed, it must be scoped based on the requirements or criteria specified by the consumer. This translates to filtering the data while querying it. Like every other relational database, Postgres offers different operators and options to filter data while querying. Let’s go ahead and learn some of the most fundamental ways to filter data hosted in PostgreSQL.

Most of them are the same as what you have in T-SQL, but not everything.

Comments closed

Joining to STRING_SPLIT

Kevin Wilkie explains that the STRING_SPLIT() function isn’t something one simply joins to:

My friends! Last time together, we discussed using the STRING_SPLIT function and how it’s used in combination with the CROSS APPLY.

First off, most of us are used to working with an INNER JOIN instead of CROSS APPLY. Well, you’re not going to be able to use an INNER JOIN when you’re using the STRING_SPLIT function.

Read on for a demonstration.

Comments closed

Row Pattern Recognition in Snowflake

Koen Verbeeck knows how to make my blood boil:

I’m doing a little series on some of the nice features/capabilities in Snowflake (the cloud data warehouse). In each part, I’ll highlight something that I think it’s interesting enough to share. It might be some SQL function that I’d really like to be in SQL Server, it might be something else.

In the book T-SQL Window Functions – For data analysis and beyond, Itzik Ben-Gan explains the concept of row-pattern recognition (RPR) in a dedicated chapter (you can find a full book review here). It’s a concept that doesn’t exist in T-SQL, but is described in the SQL standard and is available in some other database systems. Snowflake has recently introduced support for RPR. 

Jokes about being angry aside, I’d really like to see row pattern recognition in SQL Server. It’s definitely not trivial to learn, but once you do, there’s a lot of power available to you. Koen also links to the Feedback item about this, so vote on that as well.

Comments closed

DATE_BUCKET() and an Alternative

Itzik Ben-Gan takes us through the DATE_BUCKET() function:

Bucketizing date and time data involves organizing data in groups representing fixed intervals of time for analytical purposes. Often the input is time series data stored in a table where the rows represent measurements taken at regular time intervals. For example, the measurements could be temperature and humidity readings taken every 5 minutes, and you want to group the data using hourly buckets and compute aggregates like average per hour. Even though time series data is a common source for bucket-based analysis, the concept is just as relevant to any data that involves date and time attributes and associated measures. For example, you might want to organize sales data in fiscal year buckets and compute aggregates like the total sales value per fiscal year. In this article, I cover two methods for bucketizing date and time data. One is using a function called DATE_BUCKET, which at the time of writing is only available in Azure SQL Edge. Another is using a custom calculation that emulates the DATE_BUCKET function, which you can use in any version, edition, and flavor of SQL Server and Azure SQL Database.

DATE_BUCKET() is something I’d like to see in the next version of SQL Server on-premises. There are some peculiarities to how it works and behavior isn’t always exactly what you’d expect, but it does accomplish what it sets out to do.

Comments closed