Building a database on Elasticsearch is something of a double-edged sword. Crate has been able to focus on hard problems like query planning, joins, aggregations, and so on–without having to take on the tough work of building a storage layer, cluster membership, replication algorithm, etc. However, Crate is tightly coupled to Elasticsearch, and is dependent on the Elastic team for improvements to that technology. Elasticsearch’s consistency issues have been well-known for years, and the process to fix them is still ongoing. It’s not clear what Crate can do to get out of this situation: a rewrite would be complex and expensive (and introduce new and unknown failure modes), whereas fixing Elasticsearch’s consistency problems could easily consume person-years of engineering time that a small company can ill-afford.
There are good reasons to use Crate: distributed SQL stores, especially with Crate’s capacity for aggregations and joins, are hard to come by. Moreover, Crate introduces several helpful features not present in Elasticsearch. That said, the risk of data loss is real, and is unlikely to be resolved at any point in the near future. I recommend that Crate users avoid using Crate as their system of record–at least, where each record matters. Like Elasticsearch itself, you should use a safer database as your primary store, and continuously backfill data from that primary store into Crate for querying. Crate may also be suitable for cases where occasional data loss or corruption does is mostly harmless, e.g. high-volume sensor data, observability, analytics, etc.
Every time the Jepsen series gets updated, I make time to read.