Jepsen: Crate

Kyle Kingsbury checks out Crate, a SQL database built on Elasticsearch:

Building a database on Elasticsearch is something of a double-edged sword. Crate has been able to focus on hard problems like query planning, joins, aggregations, and so on–without having to take on the tough work of building a storage layer, cluster membership, replication algorithm, etc. However, Crate is tightly coupled to Elasticsearch, and is dependent on the Elastic team for improvements to that technology. Elasticsearch’s consistency issues have been well-known for years, and the process to fix them is still ongoing. It’s not clear what Crate can do to get out of this situation: a rewrite would be complex and expensive (and introduce new and unknown failure modes), whereas fixing Elasticsearch’s consistency problems could easily consume person-years of engineering time that a small company can ill-afford.

There are good reasons to use Crate: distributed SQL stores, especially with Crate’s capacity for aggregations and joins, are hard to come by. Moreover, Crate introduces several helpful features not present in Elasticsearch. That said, the risk of data loss is real, and is unlikely to be resolved at any point in the near future. I recommend that Crate users avoid using Crate as their system of record–at least, where each record matters. Like Elasticsearch itself, you should use a safer database as your primary store, and continuously backfill data from that primary store into Crate for querying. Crate may also be suitable for cases where occasional data loss or corruption does is mostly harmless, e.g. high-volume sensor data, observability, analytics, etc.

Every time the Jepsen series gets updated, I make time to read.

Related Posts

Basics Of Elasticsearch In .NET

Ivan Cesar gives us a brief tutorial of the Elasticsearch .NET API: To be able to search something, we must store some data into ES. The term used is “indexing.” The term “mapping” is used for mapping our data in the database to objects which will be serialized and stored in Elasticsearch. We will be […]

Read More

Kafka Connect To Elasticsearch

Robin Moffatt shows how to take data from Kafka Connect and feed it into Elasticsearch: Whilst Kafka Connect is part of Apache Kafka itself, if you want to stream data from Kafka to Elasticsearch you’ll want the Confluent Open Source distribution (or at least, the Elasticsearch connector). The configuration is pretty simple. As before, see inline comments for […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930