Press "Enter" to skip to content

Category: Search

Vector Search with Oracle against Iceberg Tables

Brendan Tierney performs a search:

In my previous blog posts I’ve explored how to use Iceberg Tables and how to integrate these in with your Data Lake. Additionally, I showed how to setup your Oracle Data Lake (Database) to access the data stored in Iceberg Tables stored in OCI Object Storage. To access this Iceberg Table data from the Oracle Database we created an External Table. This allows us to query the Iceberg Table data as if it was internal to the database. With all new releases there is continuous improvement in the features and to make them easier to use. One such new feature (as of 23.26.1) is the ability to read vector data types from an External Table. This new feature is called or referred to as ‘Vectors on Ice’.

With Oracle Database External Tables now supporting vector embedding stored in Iceberg Tables, means you can generate vector embeddings with your preferred embedding model (external to Oracle using your faviourite tool/library), store them in Iceberg Tables in cloud object storage (OCI Object Storage, AWS S3, etc.), and run semantic search from Oracle AI Database, accessing vector data stored within the database and externally with the minimum of data movement and with similar SQL queries.

Click through to see how.

Leave a Comment

Finding a Burrito in Ireland

Andrew Pruski has my attention and my interest:

A while back I posted about a couple of side projects that I’ve been working on when I get the chance. One of those was the Burrito Bot…a bot to make burrito recommendations in Ireland 🙂

Over the last 18 months or so I’ve reworked this project to utilise the new vector search functionality in SQL Server 2025…so now it looks like this: –

Andrew also owns the burrito-bot.com domain, as he showed it off live during his presentation at Data Saturdays Chicago. Unfortunately, it seems the service is not up at the moment, so your best bet might be to take the code from the Burrito Bot GitHub repo and build your own.

Comments closed

Code to Perform Binary Search in SQL Server

Andy Brownsword has a procedure:

Let’s recap what we’re doing here:

Large append-heavy tables – like logs or audits – often don’t have a useful index on the timestamp. These types of tables do however have a strong correlation between their clustering key and the timestamp due to chronological inserts.

A binary search approach splits the table in half to narrow down the search space with each iteration. By abusing the incremental relationship between the clustering key and timestamps, we can quickly zero in on the point in time we’re after. If you want to see the mechanics, check out last week’s post.

I love the approach for log tables, assuming that a timestamp is part of the filter. This is a clever application of a very common computer science algorithm to database operations.

Comments closed

Vector Search: Negation and Cross-Encoding

Joe Sack digs into a common problem in vector search. First up is a description of the problem:

I embedded two queries: “home with pool” and “home without pool.” The cosine similarity was 0.82. The embedding model treats negated queries as nearly identical to their positive counterparts.

For comparison, completely unrelated queries (“home with pool” vs “quarterly earnings report”) scored 0.13.

And one imperfect solution:

In my last post, I showed that vector search treats “home with pool” and “home without pool” as nearly identical (0.82 similarity). Bi-encoders struggle with negation.

Cross-encoders can help with this. But there’s a trade-off.

Read on to learn how cross-encoders can help, but they come at a significant cost. Joe also describes a pattern that can minimize the total pain level when using cross-encoders.

Comments closed

Text Search in PostgreSQL

Jay Miller is looking for strings in all the wrong places:

I like to think of this like seeing a doctor. You can go to a family doctor and they can help you with most things. For specific results, it’s better to see a specialist who has a better understanding of the particular issue.

Search is the same way. At the end of the day, you get results but what you put into your search will affect what you get. That said there are some search methods that work better than others depending on your data.

Read on for several techniques that are available. I do think the headers denigrate LIKE/iLIKE a bit too much, as it works pretty well in many circumstances. But there are definitely good times to bring out the more powerful mechanisms, as this article shows.

Comments closed

Vector Search from Scratch

Kanwai Mehreen does a bit of searching:

In this article, I’ll walk you through every step from generating vector representations to searching using cosine similarity, and we’ll even visualize what’s happening behind the scenes. By the end, you’ll not only understand how vector search works but also have a working implementation you can build on. So, let’s get started.

It’s kind of funny how simple this is, but it is. A lot of the complexity is around data quality operations, as well as optimizing the search process.

Comments closed

Semantic Search in PostgreSQL

Hans-Jürgen Schönig performs a search:

PostgreSQL offers advanced capabilities through extensions like pgvector, which enable semantic search at a level and quality never achieved before. Unlike traditional text search, which mostly relies on trivial string comparison, semantic search in PostgreSQL goes beyond keywords by understanding context and meaning, enhancing relevance.

The quick idea here is that converting words (or parts of words) into vectors can maintain most of the semantic meaning behind the words. Then, when we perform certain types of vector comparisons, we can take advantage of this semantic meaning and find results whose language is different from our query but the concept is a match for what we want. Click through for the full article.

Comments closed

Knowledge Management via Azure OpenAI

Paul Hernandez builds a system:

In this post, I would like to show you how I implemented a simple use case to exemplify how you can query your data by implementing a chat application using Azure Open AI. Of course, we cannot only answer questions, LLMs are also capable of summarizing texts, or extracting entities. I decided to call it “Knowledge Management Assistant”, since I would like to use the application to assist me with some tedious tasks, which consumes some of my limited time.

Click through for the process. I would have recommended checking the box for vector search, though I imagine that would have blown past the limitations of the Basic tier of Azure AI Search (nee Azure Cognitive Search).

Comments closed

Combining Cosmos DB and Azure Search

Hasan Savran does some looking:

In my previous post, I discussed the process of establishing a Free-text search for Azure Cosmos DB. Towards the end, I demonstrated how to carry out a free-text search using the Azure Portal. Now, I will guide you on how to perform this search using code. To perform this search by code, I created a basic console application and added Azure.Search.Documents and Microsoft.Azure.Cosmos.

Click through for that demonstration.

Comments closed