Press "Enter" to skip to content

Author: Kevin Feasel

Vector Search in Azure Databases

Paul Hernandez describes the current state of production-ready vector search options in Azure:

Vector databases and vector search are becoming increasingly important in modern applications due to their ability to handle complex and high-dimensional data efficiently. In today’s data-driven world, applications such as recommendation systems, image and video retrieval, natural language processing, and anomaly detection rely heavily on the ability to search and analyze large volumes of data quickly and accurately. Vector databases store data in the form of vectors, which allows for more sophisticated and nuanced searches compared to traditional databases. Vector search techniques enable these applications to find similar items, detect patterns, and make predictions by comparing the distances between vectors. This capability is crucial for delivering personalized user experiences, improving search accuracy, and enhancing overall application performance. As a result, vector databases and vector search are essential components in the toolkit of modern data scientists and engineers.

In this article, we will discuss several Azure services that support vector search, including Azure Database for PostgreSQL Flexible Server, Azure Cosmos DB, and Azure Cognitive Search. Each of these services offers unique features and capabilities that make them suitable for implementing vector search in various applications.

Click through for details, as well as links to more resources. Paul didn’t include Azure SQL Database’s vector capabilities, though that’s in preview right now and I’m not actually sure how well it will perform compared to these other options.

2 Comments

An Overview of Azure OpenAI and the Azure AI Hub

Tomaz Kastrun has a pair of posts. First up, an overview of Azure OpenAI:

Let’s first address the elephant in the room. We have explored the Azure AI Foundry and the we have also Azure OpenAI. So what is the core difference? Let’s take a look:

The services in the back:

  • Azure AI Services has much broader AI capabilities and simpler integration into applications and usage of the real world. With mostly pre-build API for all services (face recognition, document recognition, speech recognition, computer vision, image recognition, and more) that will allow better interoperabilty and and connection to machine learning services (Azure Machine Learning Service).
  • Azure OpenAI is focusing primarly on OpenAI LLM models (Azure AI services supports many others) and provides great agents for conversations, content tools, RAG and natural language services.

After that comes an overview of the Azure AI Hub and AI projects:

In AI Foundry portal, hubs provide the environment for a team to collaborate and organize work, and help you as a team lead or IT admin centrally set up security settings and govern usage and spend. You can create and manage a hub from the Azure portal or from the AI Foundry portal, and then your developers can create projects from the hub.

In essence, Hubs are the primary top-level Azure resource for AI Foundry. Their purpose is to to govern security, connectivity, and computing resources across playgrounds and projects.

Leave a Comment

Useful PostgreSQL Administrative Queries

Shane Borden shares some queries:

In the spirit of the holiday season, I thought I would write a quick post regarding some of my favorite queries that I use on a day to day basis working on Postgres. Some of these queries I have developed and others were found on the internet (hat tip to those who have previously posted) and further refined.

Click through for several useful queries, as well as a link to a GitHub repo that Shane maintains, containing plenty more.

Leave a Comment

Cosine Similarity in Power Query

John Kerski searches for similar sets:

I’ll admit upfront—I am not a data scientist by trade. Instead, I’ve picked up my data science skills over time, learning through a combination of osmosis from talented colleagues and tackling real-world data challenges. It’s been a journey of trial, error, and refinement, as I’ve worked to bridge gaps between complex data science techniques and tools available to me.

Recently, my skills were put to the test when I needed to compare hundreds of Active Directory and SharePoint Groups to find similarities in their memberships. With only Power Query available in the production environment, no Python or R to ease the process, I faced the task of finding a method to finding similarities from scratch in Power Query. In this guide, I’ll walk you through the solution I developed, highlighting the steps that made it possible.

John came up with a very clever solution. By the way, the way I like to explain cosine similarity (as a concept, not the algorithm itself) is as follows.

Back in high school physics, you probably drew vectors and learned that vectors have a direction and a magnitude (length). We drew vectors in two-dimensional space because that’s easy: it’s a line on a sheet of paper and there’s an arrow at the end to denote the direction of that vector. Conceptually, vectors with more than two dimensions behave exactly the same; the difference is that we cannot simply draw them, especially once we get past three-dimensional space (a vector with three elements). But the concept is still there: every vector has a direction and a magnitude.

We use cosine similarity to compare two vectors and see how close those two vectors are in terms of angle (direction), with the idea being that magnitude isn’t as important as angle for determining vector similarity. This is in contrast to another technique like Euclidean distance, which focuses more on the magnitude of the vectors versus angle.

Leave a Comment

Azure SQL Managed Instance Extreme Storage Latency

Kendra Little has another caveat emptor message:

What are your stories of unbelievably bad performance from cloud vendors? I’ll go first. For years, Azure SQL Managed Instance’s General Purpose Tier has documented “approximate” storage latency as being “5-10 ms.” This week they added a footnote: “This is an average range. Although the vast majority of IO request durations will fall under the top of the range, outliers which exceed the range are possible.”

How approximate is that 5-10 milliseconds, you might wonder? If you use Azure SQL Managed Instance these days, you will regularly find messages in your SQL Server Error log indicating that all data and log files have experienced latency of up to 60 seconds. At least, 60 seconds is the maximum I’ve observed personally, looking in the logs of several customers’ Managed Instances. Could it be worse? Microsoft hasn’t documented a ceiling. My testing shows that this latency occurs randomly to your workload and is not related to your resource usage: using less IO will not make the errors less likely. You have no way to avoid these storage failures (I don’t see how 15-60 second latency is not a failure), and they can occur anytime.

This is a major strike against SQL Managed Instance General Purpose. Considering the cost of the product, you could buy a new server with direct-attached NVMe storage, have it paid off after one year, have better performance, and get to depreciate the entire expense over a 3-5 year window (something you cannot do with the hardware side of cloud services–you can only depreciate the cost of licensing, assuming you have a 3-year reservation).

2 Comments

Availability Group Seeding and Transient Failure 108

Chad Callihan runs into an error with an availability group:

The availability group in question was unhealthy, and none of the added databases were syncing. By the time I started investigating, the SQL service on the secondary had been restarted. There were also no recent errors in Failover Cluster Manager.

I checked the SQL Server Error Log and found some clues. The SQL Server Error Log was filled with “Always On: DebugTraceVarArgs” errors for each database that included the message:

“Seeding encountered a transient failure ‘108’, retrying…”

Read on to see how Chad fixed this.

Leave a Comment

Fine-Tuning an Azure AI Model

Tomaz Kastrun updates a generative AI model:

Fine-tuning is the process of optimizing a pretrained model by training it on your specific dataset, which often contains more examples than you can typically fit in a prompt. Fine-tuning helps you achieve higher quality results for specific tasks, save on token costs with shorter prompts, and improve request latency.

Read on to see how you can do this. Note that you’ll need to set up the fine-tuning data in a particular format for whatever model you’re using.

Leave a Comment

Using na.rm in R

Steven Sanderson handles missing information in the best way possible—by ignoring it:

Missing values are a common challenge in data analysis, and R provides robust tools for handling them. The na.rm parameter is one of R’s most essential features for managing NA values in your data. This comprehensive guide will walk you through everything you need to know about using na.rm effectively in your R programming journey.

Read on for several examples of how na.rm works.

Leave a Comment