Press "Enter" to skip to content

Category: Syntax

The DIFFERENCE() and SOUNDEX() Functions

Hadi Fadlallah looks at two methods of string distance:

Soundex is a phonetic algorithm developed by Robert C. Russell and Margaret King Odell in the early 1900s. This algorithm is used to index names as they are pronounced in English. The main goal of such an algorithm is to encode homophones to the same representation to be matched even if there are some slight spelling differences. As an example, consider the names “Smith” and “Smyth”, or “Mohamad” and “Mouhammad”. Soundex mainly encodes consonants and only encodes a vowel if it is the first letter of the name.

Being one of the most popular phonetic algorithms, Soundex was implemented in multiple database engines such as OracleSQL ServerMySQLSQLite, and PostgreSQL.

These two methods are not perfect and they do really limit you to one word (or small word grouping), but they are useful.

Comments closed

CORRESPONDING and ANSI SQL

Lukas Eder looks at a rarely-implemented keyword in SQL:

I recently stumbled upon a standard SQL feature that was implemented, to my surprise, in HSQLDB. The keyword is CORRESPONDING, and it can be used with all set operations, including UNIONINTERSECT, and EXCEPT.

Click through to see what it does. Be sure to check out the comments, where Joe Celko pops in to provide some additional historical context to explain why you won’t find this keyword in many implementations of the standard..

Comments closed

TRY_CAST and TRY_PARSE

Joe Obbish shows the difference between two functions:

There’s a lot of guidance out there that states that TRY_CAST is a faster, more modern version of TRY_PARSE and that TRY_PARSE should only be used if you need to set the optional culture parameter. However, the two functions can return different results in some cases, even without the culture parameter.

That guidance is blatantly wrong. TRY_CAST() and TRY_PARSE() both came out in SQL Server 2012. TRY_PARSE() uses .NET to perform parsing, which is going to have some edge case differences, especially around cultures and localization. TRY_CAST() is CAST() in an error-safe wrapper. If anything, TRY_CAST() is the “old” version and TRY_PARSE() the “new” version, with scare quotes in place because they both came out at the same time.

Both of them are useful, though I do agree with Joe’s advice of avoiding TRY_PARSE(), at least for larger datasets. If you’re parsing a single date or a small table of dates, TRY_PARSE() does an excellent job because TRY_PARSE('13/01/2019' AS DATE USING 'fr-fr') is not something you can easily do with TRY_CAST() in a US locale.

Comments closed

Finding Substrings in a String with T-SQL

Kevin Wilkie avoids a regex:

Continuing on with our series from last time – see here if you somehow missed it – let’s have some more fun with the different functions we can use with strings.

This time, let’s focus on looking for different items we can use to find a string within a string.

With T-SQL not natively supporting regular expressions—though you can use a CLR module to do this—click through to see what Kevin uses.

Comments closed

Implementing GREATEST in SQL Server 2019

Ronen Ariely is on a mission to be the greatest:

The function GREATEST returns the maximum value from a list of one or more expressions. It returns the data type with the highest precedence from the set of types passed to the function.

This function was added to Azure SQL. At this time, it is supported in Azure SQL Database, Azure SQL Managed Instance and Azure Synapse Analytics serverless. 

Unfortunately, it not yet supported on SQL Server on premises and synapse dedicated sql pool.

Click through for a pair of alternative constructs while we wait for GREATEST on-premises.

Comments closed

Use TOP instead of SET ROWCOUNT

Jared Poche explains why the TOP clause is superior to using SET ROWCOUNT:

I was presenting on how to use the TOP clause to break down large operations into short, fast, bite-sized operations. The mechanics are things I learned from writing processes that do garbage collection, backfill new columns, and anonymizing PII data on existing tables. I’ve just posted the slides and example scripts here if you are interested.

ARE THEY THE SAME?

The question was whether the SET ROWCOUNT command would work just the same, and the answer is sometimes yes but largely no.

Read on to see what Jared means.

Comments closed

Using GREATEST and LEAST in Azure SQL DB

Aaron Bertrand preps us for SQL Server 2022:

In an earlier tip, “Find MAX value from multiple columns in a SQL Server table,” Sergey Gigoyan showed us how to simulate GREATEST() and LEAST() functions, which are available in multiple database platforms but were – at least at the time – missing from Transact-SQL. These functions are now available in Azure SQL Database and Azure SQL Managed Instance, and will be coming in SQL Server 2022, so I thought it was a good time to revisit Sergey’s methods and compare.

Read on to see how the workaround compares.

Comments closed

CONVERT and Binary Styles

Abayomi Obawomiye has style:

I recently had a requirement to load some data from a source table to another destination table. The destination columns were exactly the same as the source columns with the same data types and length. The only difference was that some columns on the destination table must be encrypted. The task was to use the SHA2_512 encryption algorithm to encrypt the “sensitive” data. I will talk more about the encryption algorithm in another post.

To achieve this, I needed to use the HASHBYTES function in SQL Server. The challenge was that this function used with the SHA2_512 encryption algorithm will return a fixed character length of 64 characters which will be longer than the character length on my destination table. As a result, SQL Server will throw a truncation error. I will demonstrate this below.

One really important point: SHA is not encryption; it’s a hash (which is why the function is HASHBYTES() instead of something like EncryptByKey() as column-level security uses). Hashes are intended to be a one-way trip, whereas encryption implies an ability to decrypt if you have the relevant key details. Here, the use looks to be obfuscating the text of sensitive data fields, perhaps for loading in a dev/test environment, and so the actions themselves are quite reasonable.

By the way, the styles Abayomi talks about are all listed in this Docs page. Turns out that if you’re using a money datatype, you can use CONVERT() to display the end result with commas.

Comments closed

Powershell Functions and Return Type Oddities

Dave Mason goes down the rabbit hole:

As a result of some struggles trying to automate a process, I’ve learned some things about PowerShell. After getting to the bottom of a time-consuming problem, I thought it worth a blog post that might save someone else some time and heartache.

Let’s begin with this simple function named Get-RandomDate. It generates and returns a random date that is between today and X days ago. It has an input parameter $DaysAgo, which is of type [System.Int32]–it is a mandatory parameter.

That’s all straightforward, but then things get weird.

Comments closed