Syntax – Page 36 – Curated SQL

Now let’s say you also filter the sales record by sku (stock-keeping unit aka. barcode) in addition to sale_date and country. Creating a partition on sku will result in many partitions which is not ideal as it might result in uneven and smaller partitions.
Hadoop is not efficient in processing small volumes of data. There is a better way.

Read on to understand when each technique makes sense.

Comments closed

Concatenating in SQL Server

Published 2021-09-20 by Kevin Feasel

Lee Markup takes us through a pair of very useful functions in SQL Server:

SQL Server concatenation methods have been enhanced in modern versions of SQL Server. SQL Server 2012 introduced the CONCAT() function. In SQL Server 2017 we get CONCAT_WS().
A common usage of concatenation, or joining column values together in a string, is combining a FirstName and LastName column into a FullName column. Another common usage might be for creating an address column that pulls together building number, street, city and zip code.

Read on to learn more. CONCAT() and CONCAT_WS() are also extremely helpful for change detection in ETL processes. For example, you might have a queue table to process and only want to update records in which relevant source fields changed, ignoring the ones which don’t exist in your destination. A combination of HASHBYTES() and CONCAT_WS() will do the trick quite nicely.

Comments closed

Order, Sort, Cluster, and Distribute in Hive

Published 2021-09-13 by Kevin Feasel

The Hadoop in Real World team give us three methods (and one synonym) to organize results in Hive:

Hive provides 3 options to order or sort the result of records – order by, sort by, cluster by and distribute by. Which option you choose has performance implications. So it is important to understand the difference between the options and choose the right one for the use case at hand.

Click through for a high-level overview of the techniques.

Comments closed

Avoiding WHILE 1=1 Loops

Published 2021-09-09 by Kevin Feasel

Aaron Bertrand does not believe in the power of the infinite loop:

A short time ago a colleague had an issue with a Microsoft SQL Server stored procedure. They were using our recommended approach for batching updates, but there was a small problem with their code that led to the procedure “running forever.” I think we’ve all made a mistake like this at one point or another; here’s how I try to avoid the situation altogether.

The argument isn’t “don’t use WHILE loops” or “don’t use batching logic,” but instead to ensure that you have a break condition somewhere. It’s reasonable to ask for an end state before you begin processing something, after all.

Comments closed

LATERAL VIEW in Hive

Published 2021-09-08 by Kevin Feasel

The Hadoop in Real World team provides a quick example of a powerful feature in Apache Hive:

Lateral view is used in conjunction with user-defined table generating functions such as explode(). A UDTF generates zero or more output rows for each input row.
Click here if you like to know the difference between UDF, UDAF and UDTF
A lateral view first applies the UDTF to each row of the base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.

In other words, LATERAL joins are the SQL standard for Microsoft’s CROSS APPLY operator. I normally dislike having different names for the same thing due to the risk of confusion, but in fairness to Microsoft on this one, my recollection is that the common name came after SQL Server 2005, which already had CROSS APPLY and OUTER APPLY.

Comments closed

Break vs Throw in Powershell

Published 2021-09-07 by Kevin Feasel

Patrick Gruenauer explains the differences between two control statements:

If you are a PowerShell scripter, you may have come across two code statements that are not unique to PowerShell: Break and Throw. In this blog post I will carry out some examples to shed some light on this very important statements.

If you’re confused about when to use which, check out Patrick’s clear explanation.

Comments closed

Getting Distinct Values before STRING_AGG

Published 2021-08-24 by Kevin Feasel

Greg Dodd shows how to remove duplicate values from a list before passing them to the STRING_AGG() function:

SQL introduced the new STRING_AGG feature in SQL 2017, and it works just like it suggests it would: it’s an aggregate function that takes all of the string values and joins them together with a separator. To see how it works, I’m using the StackOverflow users table, and let’s say we want to create a list of Display Names and we’re going to group it based on Location:

Click through for two methods, one of which is considerably better than the other.

Comments closed

Ways to Filter Data in PostgreSQL

Published 2021-08-19 by Kevin Feasel

Gauri Mahajan shows off several techniques for filtering data in PostgreSQL:

Data is hosted in a variety of data repositories, one of which is relational databases. Out of tens of commercial and open-source relational databases, one of the most popular open-source relational databases is PostgreSQL. This database is offered on the Azure cloud platform through a service named Azure Database for PostgreSQL. One of the most fundamental operations performed on the database is reading and writing data to consume and host data. It goes without saying that when the data is consumed, it must be scoped based on the requirements or criteria specified by the consumer. This translates to filtering the data while querying it. Like every other relational database, Postgres offers different operators and options to filter data while querying. Let’s go ahead and learn some of the most fundamental ways to filter data hosted in PostgreSQL.

Most of them are the same as what you have in T-SQL, but not everything.

Comments closed

Joining to STRING_SPLIT

Published 2021-08-18 by Kevin Feasel

Kevin Wilkie explains that the STRING_SPLIT() function isn’t something one simply joins to:

My friends! Last time together, we discussed using the STRING_SPLIT function and how it’s used in combination with the CROSS APPLY.
First off, most of us are used to working with an INNER JOIN instead of CROSS APPLY. Well, you’re not going to be able to use an INNER JOIN when you’re using the STRING_SPLIT function.

Read on for a demonstration.

Comments closed

Row Pattern Recognition in Snowflake

Published 2021-08-17 by Kevin Feasel

Koen Verbeeck knows how to make my blood boil:

I’m doing a little series on some of the nice features/capabilities in Snowflake (the cloud data warehouse). In each part, I’ll highlight something that I think it’s interesting enough to share. It might be some SQL function that I’d really like to be in SQL Server, it might be something else.
In the book T-SQL Window Functions – For data analysis and beyond, Itzik Ben-Gan explains the concept of row-pattern recognition (RPR) in a dedicated chapter (you can find a full book review here). It’s a concept that doesn’t exist in T-SQL, but is described in the SQL standard and is available in some other database systems. Snowflake has recently introduced support for RPR.

Jokes about being angry aside, I’d really like to see row pattern recognition in SQL Server. It’s definitely not trivial to learn, but once you do, there’s a lot of power available to you. Koen also links to the Feedback item about this, so vote on that as well.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Category: Syntax

Partitioning vs Bucketing in Hive

Concatenating in SQL Server

Order, Sort, Cluster, and Distribute in Hive

Avoiding WHILE 1=1 Loops

LATERAL VIEW in Hive

Break vs Throw in Powershell

Getting Distinct Values before STRING_AGG

Ways to Filter Data in PostgreSQL

Joining to STRING_SPLIT

Row Pattern Recognition in Snowflake