Elasticsearch – Curated SQL

k Nearest Neighbors Search in Elasticsearch

Published 2025-07-08 by Kevin Feasel

Govind Singh Rawat looks for nearby documents:

Businesses are increasingly relying on intelligent search capabilities to enhance customer experience, automate insights, and unlock the potential of unstructured information. Elasticsearch, a leading distributed search and analytics engine, is at the heart of many such systems. One of its powerful and lesser-known capabilities is support for k-nearest neighbors (k-NN) search, a method particularly useful for similarity-based retrieval in domains such as semantic search, recommendation engines, and image recognition.

This article delves into what Elasticsearch and k-NN search are, how the two are integrated, and how to configure and optimize k-NN in Elasticsearch for real-world applications.

Click through for a high-level primer on the topic, as well as a few links to additional resources.

Thoughts on Scaling Elasticsearch

Published 2025-02-27 by Kevin Feasel

Vivek Kumar can’t stop at one:

With the evolution of modern applications serving increasing needs for real-time data processing and retrieval, scalability does, too. One such open-source, distributed search and analytics engine is Elasticsearch, which is very efficient at handling data in large sets and high-velocity queries. However, the process for effectively scaling Elasticsearch can be nuanced, since one needs a proper understanding of the architecture behind it and of performance tradeoffs.

Click through for those considerations and the trade-offs you might see.

Comments closed

Transitioning from Elasticsearch to OpenSearch

Published 2025-01-21 by Kevin Feasel

Nileh Jain has a guide for us:

Elasticsearch and OpenSearch are powerful tools for handling search and analytics workloads, offering scalability, real-time capabilities, and a rich ecosystem of plugins and integrations. Elasticsearch is widely used for full-text search, log monitoring, and data visualization across industries due to its mature ecosystem. OpenSearch, a community-driven fork of Elasticsearch, provides a fully open-source alternative with many of the same capabilities, making it an excellent choice for organizations prioritizing open-source principles and cost efficiency.

Migration to OpenSearch should be considered if you are using Elasticsearch versions up to 7.10 and want to avoid licensing restrictions introduced with Elasticsearch’s SSPL license. It is also ideal for those seeking continued access to an open-source ecosystem while maintaining compatibility with existing Elasticsearch APIs and tools. Organizations with a focus on community-driven innovation, transparent governance, or cost control will find OpenSearch a compelling option.

Click through for the prep work and the guide.

Comments closed

Vector Search Performance Optimizations in Elasticsearch

Published 2024-11-06 by Kevin Feasel

Venkata Gummadi works on vector search response times:

As data engineers, we are tasked with implementing these sophisticated solutions, ensuring organizations can derive actionable insights from vast datasets. This article explores the intricacies of vector search using Elasticsearch, focusing on effective techniques and best practices to optimize performance. By examining case studies on image retrieval for personalized marketing and text analysis for customer sentiment clustering, we demonstrate how optimizing vector search can lead to improved customer interactions and significant business growth.

Read on for a vector search primer and some guidance of how you can improve the performance of vector search queries. I’d expect that much of this can also apply to Azure AI Search and Amazon OpenSearch.

Comments closed

Debugging an Unresponsive Elasticsearch Cluster

Published 2023-10-06 by Kevin Feasel

Derric Gilling troubleshoots an Elasticsearch cluster:

Because of this sharding, a read or write request to an Elasticsearch cluster requires coordinating between multiple nodes as there is no “global view” of your data on a single server. While this makes Elasticsearch highly scalable, it also makes it much more complex to setup and tune than other popular databases like MongoDB or PostgresSQL, which can run on a single server.

When reliability issues come up, firefighting can be stressful if your Elasticsearch setup is buggy or unstable. Your incident could be impacting customers which could negatively impact revenue and your business reputation. Fast remediation steps are important, yet spending a large amount of time researching solutions online during an incident or outage is not a luxury most engineers have. This guide is intended to be a cheat sheet for common issues that engineers running that can cause issues with Elasticsearch and what to look for.

Read on for several helpful tips.

Comments closed

Deleting an Elasticsearch Index

Published 2023-06-28 by Kevin Feasel

The Big Data in Real World team is done with this index:

Simple problem with a simple solution. In this post we will see how to delete an index in Elasticsearch.

Read on for the command to delete an index, and a demonstration of it in action.

Comments closed

Removing a Node from Elasticsearch

Published 2023-04-26 by Kevin Feasel

The Big Data in Real World team spams the delete button:

Shutting down a node abruptly is not the right way to decommission or remove a node from the Elasticsearch cluster. Doing so will cause your shards which are replicated to go down in replication and it could cause disruption to the clients who are currently consuming data from Elasticsearch.

Proper way to decommission or remove a node from Elasticsearch is to add the host to the exclusion list.

Click through to learn how to do this.

Comments closed

Creating an Alias in Elasticsearch

Published 2023-04-12 by Kevin Feasel

The Big Data in Real World team needs an alias:

An alias as the name suggests is an alias or another name to the index in Elasticsearch. It is quite useful when you want to refer to an index by another name. So instead of performing an reindex to rename or cloning an index you can create an alias to the index.

Click through for the script to create an alias, how you might use one, and the right way to delete one without removing the underlying article.

Comments closed

Migrating from Elasticsearch to Azure Data Explorer

Published 2023-02-03 by Kevin Feasel

Bhaskar Kakaraparthy does a logging switcharoo:

This article is an extension to an existing article to migrate data from Elastic Search to Azure Data Explorer (ADX) using Logstash pipeline as a step-step-step guide. In this article, we will explore the process involved in migrating data from one source (ELK) to another (ADX) and discuss some of the best practices and tools available to make the process as smooth as possible.

Using Logstash for data migration from Elasticsearch to Azure Data Explorer (ADX) was a smooth and efficient process. With the help of ADX output plugin & Logstash, I was able to migrate approximately 30TBs of data in a timely manner. The configuration was straightforward, and the data transfer with ADX output plugin was quick and reliable. Overall, the experience of using ADX output plugin with Logstash for data migration was positive and I would definitely use it again for similar projects in the future.

Read on to see how.

Comments closed

Shipping Kafka Logs to Kibana via Filebeat

Published 2022-05-11 by Kevin Feasel

Shivani Sarthi uses Filebeat to perform log shipping:

To ship the Kafka logs, we will be using the filebeat agent. A filebeat agent is a lightweight shipper whose purpose is to forward and centralize the log data.
For filebeat to work, you need to install it as an agent on the desired servers. Filebeat then monitors the log files, collects the log events, and forwards them to the ElasticSearch or LogStash for indexing.

Click through for an Ansible script to install Filebeat, integrate with Kafka, and communicate with Logstash for eventual querying via Kibana.

Comments closed

Category: Elasticsearch