Jepsen: Crate

Kyle Kingsbury checks out Crate, a SQL database built on Elasticsearch:

Building a database on Elasticsearch is something of a double-edged sword. Crate has been able to focus on hard problems like query planning, joins, aggregations, and so on–without having to take on the tough work of building a storage layer, cluster membership, replication algorithm, etc. However, Crate is tightly coupled to Elasticsearch, and is dependent on the Elastic team for improvements to that technology. Elasticsearch’s consistency issues have been well-known for years, and the process to fix them is still ongoing. It’s not clear what Crate can do to get out of this situation: a rewrite would be complex and expensive (and introduce new and unknown failure modes), whereas fixing Elasticsearch’s consistency problems could easily consume person-years of engineering time that a small company can ill-afford.

There are good reasons to use Crate: distributed SQL stores, especially with Crate’s capacity for aggregations and joins, are hard to come by. Moreover, Crate introduces several helpful features not present in Elasticsearch. That said, the risk of data loss is real, and is unlikely to be resolved at any point in the near future. I recommend that Crate users avoid using Crate as their system of record–at least, where each record matters. Like Elasticsearch itself, you should use a safer database as your primary store, and continuously backfill data from that primary store into Crate for querying. Crate may also be suitable for cases where occasional data loss or corruption does is mostly harmless, e.g. high-volume sensor data, observability, analytics, etc.

Every time the Jepsen series gets updated, I make time to read.

Related Posts

Replicating Solr Indexes

Nirmal Prabhu walks us through configuring replicated Solr instances: Step 4: [Creating master Core] First, we need to create a core for indexing the data. The Solr create command has the following options: -c <name> — Name of the core or collection to create (required). -d <confdir> — The configuration directory, useful in the SolrCloud mode. -n <configName> — The configuration […]

Read More

Connecting To Elasticsearch With R

Jerod Johnson has a sample of connecting to Elasticsearch with R: You will need the following information to connect to Elasticsearch as a JDBC data source: Driver Class: Set this to cdata.jdbc.elasticsearch.ElasticsearchDriver. Classpath: Set this to the location of the driver JAR. By default, this is the lib subfolder of the installation folder. The DBI functions, […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930