Press "Enter" to skip to content

Category: Syntax

Adding Row Numbers to a SQL Query

Steve Jones enumerates a result set:

I’m going to use some fun data for me. I’ve been tracking my travels, since I’m on the road a lot. I’m a data person and part of tracking is trying to ensure I’m not doing too much. Just looking at the data helps me keep perspective and sometimes cancel (or decline) a trip.

In any case, you don’t care, but I essentially have this data in a table. As you can see, I have the date of travel, the city, area, etc. I also have a few flags as to whether I was traveling that day, if I spent a night away from home, and how far I was.

Read on for a few trials with ROW_NUMBER().

Comments closed

Memoizing Functions with Snowflake

Kevin Wilkie is speaking my language (that is, the language of functional programming):

If you’ve been working with data for several years like I have – mostly using the SQL language – then I have a term for you that other languages, like JavaScript or Python, have had for a few years. The term is “memoizable” and it means, in a nutshell, to remember. A memoizable function caches the results so that it can return the resultset in record time, given the same parameters.

Yeah, it’s a fancy term that basically states, “Instead of calculating the result each time, I’ll just create a lookup table of all possible inputs and what the output is.” It’s really helpful when you have a small number of possible inputs and generating a result takes a while.

Read on to learn more about how this works in Snowflake, including several limitations.

Comments closed

Reviewing the DATE_BUCKET() and DATETRUNC() Commands

Chad Callihan checks out a pair of functions new to SQL Server 2022:

If you haven’t experimented much with SQL Server 2022, you may not be familiar with the new DATE_BUCKET and DATETRUNC functions. Both of these functions are useful when it comes to aggregating data. Let’s take a look at each function and walk through a few examples.

The DATE_BUCKET() function takes a bit of time getting used to, especially when you don’t use clean intervals like 1 hour or 1 minute. DATETRUNC(), meanwhile, is something I’ve wanted for quite some time.

Comments closed

Techniques for Unpivoting Data in SQL Server

Jared Westover performs a technique showdown:

A few weeks ago, I helped someone combine multiple integer columns into a single column. The common term for this process is unpivoting or transposing. The table they worked with had millions of rows, so they needed the most efficient method to accomplish this task. If you search online, you’ll find several suggestions for doing this. But which performs best? Is one easier to maintain than the others?

Click through for a review of three separate techniques: using the UNPIVOT operator, using UNION ALL, and using CROSS APPLY. The dataset was relatively small, but even at that size, CROSS APPLY did a good job. But I won’t spoil too much here.

Comments closed

Not all Window Function Calls are Deterministic

Chen Hirsh reminds us of an important fact:

One of the scariest things that can happen when you develop SQL code is getting indeterminate results. When you run your query multiple times and each time gets a different value.

How can that happen?

SQL functions or queries can be categorized into two groups – deterministic vs. non-deterministic.

I even have a demo where I show off the non-deterministic nature of ROW_NUMBER() versus, say, RANK() or DENSE_RANK(). But there’s nothing too scary about it: just ensure that your PARTITION BY criteria are guaranteed to be unique and you’re good to go. H/T Madeira Data Solutions blog.

Comments closed

Vector Search in Oracle

Brendan Tierney shows a few queries:

It can be very straightforward to use Vectors using SQL. It’s just a simple SQL query, with some additional Vector related requirements. The examples given below are a collection of some simple examples. These aren’t my examples, but they come from either documentation or from other examples people have come up with. I’ve tried to include references back to the original sources for these, and if I’ve missed any or referred to the wrong people, just let me know and I’ll correct the links.

In my next post on Vector Databases, I’ll explore a slightly more complex data set. I’ll use the Wine dataset used in a previous post and Vector Search to see if I can find a suitable wine. Some years ago, I had posts and presentations on machine learning to recommend wine. Using Vector Search should give us better recommendations (hopefully)!

Click through for those queries.

Comments closed

Using SUBSTRING() in PostgreSQL

Rajendra Gupta builds a substring:

The PostgreSQL substring function returns a subset of the input string based on the supplied string starting position and length. You can use the SUBSTRING function for extracting data as per fixed string length and regular expressions.

Syntax:

SUBSTRING (String, Start_Position, length)

Read on for some common uses, as well as cases that are a bit more esoteric.

Comments closed

Generating Embeddings in Oracle from a Function or Trigger

Brendan Tierney continues a series on generative AI in Oracle:

In my previous post, I gave examples of using Cohere to create vector embeddings using SQL and of using a Trigger to populate a Vector column. This post extends those concepts, and in this post, we will use OpenAI.

Warning: At the time of writing this post there is a bug in Oracle 23.5 and 23.6 that limits the OpenAI key to a maximum of 130 characters. The newer project-based API keys can generate keys which are greater than 130 characters. You might get lucky with getting a key of appropriate length or you might have to generate several. An alternative to to create a Legacy (or User Key). But there is no guarantee how long these will be available.

Assuming you have an OpenAI API key of 130 characters or less you can follow the remaining steps. This is now a know bug for the Oracle Database (23.5, 23.6) and it should be fixed in the not-too-distant future. Hopefully!

Read on to learn more.

Comments closed

Hierarchical Data Types in Postgres

Florent Jardin builds a hierarchy:

The SQL standard defines a set of rules so that database systems can be interchangeable, but there are small singularities in the wild. In this regard, the hierarchyid data type provided by SQL Server is a striking example. If you are switching to PostgreSQL, two solutions are available to you.

A first and simpler solution consists in linking each node to its parent using a new parentid column and applying a foreign key constraint. Another, more complete approach consists in using the ltree extension. This article deals with the latter case.

Read on to learn more.

Comments closed