Jepsen: Crate

Kyle Kingsbury checks out Crate, a SQL database built on Elasticsearch:

Building a database on Elasticsearch is something of a double-edged sword. Crate has been able to focus on hard problems like query planning, joins, aggregations, and so on–without having to take on the tough work of building a storage layer, cluster membership, replication algorithm, etc. However, Crate is tightly coupled to Elasticsearch, and is dependent on the Elastic team for improvements to that technology. Elasticsearch’s consistency issues have been well-known for years, and the process to fix them is still ongoing. It’s not clear what Crate can do to get out of this situation: a rewrite would be complex and expensive (and introduce new and unknown failure modes), whereas fixing Elasticsearch’s consistency problems could easily consume person-years of engineering time that a small company can ill-afford.

There are good reasons to use Crate: distributed SQL stores, especially with Crate’s capacity for aggregations and joins, are hard to come by. Moreover, Crate introduces several helpful features not present in Elasticsearch. That said, the risk of data loss is real, and is unlikely to be resolved at any point in the near future. I recommend that Crate users avoid using Crate as their system of record–at least, where each record matters. Like Elasticsearch itself, you should use a safer database as your primary store, and continuously backfill data from that primary store into Crate for querying. Crate may also be suitable for cases where occasional data loss or corruption does is mostly harmless, e.g. high-volume sensor data, observability, analytics, etc.

Every time the Jepsen series gets updated, I make time to read.

Related Posts

Working With Dates And Times In Logstash

Mike Hillwig continues his Logstash series: So far, I’ve done a decent job getting the data into shape. My biggest challenge, though, was the dates and times. Dates are in one field, and the times are in another. Dates look like 2014-02-26 and times look like 0852 Using a traditional datetime datatype would be nice […]

Read More

Dropping Columns With Logstash

Mike Hillwig shows how to ignore columns with Logstash: Like I said earlier, we have some data that I know I’ll never use. This is flight performance data. The dataset contains diversion information. If a flight gets diverted more than once, it’s tracked here. I don’t care about that, so I’m dropping the diversion information […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930