Press "Enter" to skip to content

Category: Cloud

Vector Search in Azure Databases

Paul Hernandez describes the current state of production-ready vector search options in Azure:

Vector databases and vector search are becoming increasingly important in modern applications due to their ability to handle complex and high-dimensional data efficiently. In today’s data-driven world, applications such as recommendation systems, image and video retrieval, natural language processing, and anomaly detection rely heavily on the ability to search and analyze large volumes of data quickly and accurately. Vector databases store data in the form of vectors, which allows for more sophisticated and nuanced searches compared to traditional databases. Vector search techniques enable these applications to find similar items, detect patterns, and make predictions by comparing the distances between vectors. This capability is crucial for delivering personalized user experiences, improving search accuracy, and enhancing overall application performance. As a result, vector databases and vector search are essential components in the toolkit of modern data scientists and engineers.

In this article, we will discuss several Azure services that support vector search, including Azure Database for PostgreSQL Flexible Server, Azure Cosmos DB, and Azure Cognitive Search. Each of these services offers unique features and capabilities that make them suitable for implementing vector search in various applications.

Click through for details, as well as links to more resources. Paul didn’t include Azure SQL Database’s vector capabilities, though that’s in preview right now and I’m not actually sure how well it will perform compared to these other options.

2 Comments

An Overview of Azure OpenAI and the Azure AI Hub

Tomaz Kastrun has a pair of posts. First up, an overview of Azure OpenAI:

Let’s first address the elephant in the room. We have explored the Azure AI Foundry and the we have also Azure OpenAI. So what is the core difference? Let’s take a look:

The services in the back:

  • Azure AI Services has much broader AI capabilities and simpler integration into applications and usage of the real world. With mostly pre-build API for all services (face recognition, document recognition, speech recognition, computer vision, image recognition, and more) that will allow better interoperabilty and and connection to machine learning services (Azure Machine Learning Service).
  • Azure OpenAI is focusing primarly on OpenAI LLM models (Azure AI services supports many others) and provides great agents for conversations, content tools, RAG and natural language services.

After that comes an overview of the Azure AI Hub and AI projects:

In AI Foundry portal, hubs provide the environment for a team to collaborate and organize work, and help you as a team lead or IT admin centrally set up security settings and govern usage and spend. You can create and manage a hub from the Azure portal or from the AI Foundry portal, and then your developers can create projects from the hub.

In essence, Hubs are the primary top-level Azure resource for AI Foundry. Their purpose is to to govern security, connectivity, and computing resources across playgrounds and projects.

Leave a Comment

Azure SQL Managed Instance Extreme Storage Latency

Kendra Little has another caveat emptor message:

What are your stories of unbelievably bad performance from cloud vendors? I’ll go first. For years, Azure SQL Managed Instance’s General Purpose Tier has documented “approximate” storage latency as being “5-10 ms.” This week they added a footnote: “This is an average range. Although the vast majority of IO request durations will fall under the top of the range, outliers which exceed the range are possible.”

How approximate is that 5-10 milliseconds, you might wonder? If you use Azure SQL Managed Instance these days, you will regularly find messages in your SQL Server Error log indicating that all data and log files have experienced latency of up to 60 seconds. At least, 60 seconds is the maximum I’ve observed personally, looking in the logs of several customers’ Managed Instances. Could it be worse? Microsoft hasn’t documented a ceiling. My testing shows that this latency occurs randomly to your workload and is not related to your resource usage: using less IO will not make the errors less likely. You have no way to avoid these storage failures (I don’t see how 15-60 second latency is not a failure), and they can occur anytime.

This is a major strike against SQL Managed Instance General Purpose. Considering the cost of the product, you could buy a new server with direct-attached NVMe storage, have it paid off after one year, have better performance, and get to depreciate the entire expense over a 3-5 year window (something you cannot do with the hardware side of cloud services–you can only depreciate the cost of licensing, assuming you have a 3-year reservation).

2 Comments

Linked Servers into Azure

Andy Brownsword goes old-school:

Connecting different versions of SQL Server can allow us to combine or transfer data between environments. This can become a challenge when the versions are really different.

Have you tried to connect SQL Server 2008 to a SQL database in Azure? – it can throw up a few curve balls.

In this post we’ll look at how to solve 3 of the issues you might come up against.

When reading the title, my first response was, “But why not use PolyBase?” Then Andy threw the SQL Server 2008 bit at me, and then my response was, “But why not use a product that isn’t nearly old enough to vote?”

Nonetheless, Andy does a great job of demonstrating how this would work, and it can work for later versions of SQL Server as well.

Leave a Comment

Working with the Azure AI Document Service

Tomaz Kastrun continues a series on Azure AI. First up is a visual review of the Azure AI Document service:

Vision and Document services gives your apps the ability to analyze images, process documents and use technologies for optical character recognition (OCR) with combinations to machine learning.

That product has gone through a few name iterations, including Document Recognizer. But wait, there’s more!

Tomaz also takes a look at the Python SDK:

Vision and Document SDK for Python gives you extra extensibility of the services to add it to your apps.

Using Vision and Document SDK with Python, you will need to have the resource up and running (for the starters go with free pricing tier (F0)) and get the Document intelligence API Key and Endpoint address.

Click through for an example of how that works.

Leave a Comment

Using the Azure AI Language and Translation Python SDK

Tomaz Kastrun continues a series on Azure AI:

Using SDK options for “Language + Translation” service is

pip install azure-ai-textanalytics==5.2.0

and adding your endpoint in format like: https://yyyyy_azurehub_xxxxxxx.cognitiveservices.azure.com/

and secret to your endpoint. And you will also need the region name (e.g.: west-europe).

Once you’ve set up the necessary credentials, Tomaz shows how easy it is to call the service.

Leave a Comment

SQL Database in Microsoft Fabric

Deepthi Goguri is pleased with a new spin on an existing product:

“SQL database in Microsoft Fabric is a developer-friendly transactional database, based on Azure SQL Database, that allows you to easily create your operational database in Fabric. A SQL database in Fabric uses the same SQL Database Engine as Azure SQL Database.”

As you read, this is a transactional database that can be created in fabric and can be replicated to Data Lake for the analytical workloads. The other main goal is to help build AI apps faster using the SQL Databases in Fabric. The data is replicated in near real time and converted to Parquet, in an analytics-ready format.

Read on to learn more about the offering. I’m still not 100% sold on its virtues versus simply having an Azure SQL Database and enabling mirroring.

Leave a Comment

Metadata-Driven Spark Clusters in Azure Databricks

Matt Collins ties the room together with a bit of metadata:

In this article, we will discuss some options for improving interoperability between Azure Orchestration tools, like Data Factory, and Databricks Spark Compute. By using some simple metadata, we will show how to dynamically configure pipelines with appropriately sized clusters for all your orchestration and transformation needs as part of a data analytics platform.

Click through for an explanation of the challenge, followed by the how-to.

Leave a Comment