Data Modeling In Cassandra

Charmy Garg walks us through some of the basics of modeling tables in Cassandra:

Two basic goals in Cassandra which we should keep in mind:

  • Spread data evenly around the cluster – You want every node in the cluster to have roughly the same amount of data. Rows are spread around the cluster based on a hash of the partition key, which is the first element of the PRIMARY KEY. So, the key to spreading data evenly is this: pick a good primary key.

  • Minimize the number of partitions read – Partitions are groups of rows that share the same partition key. When you issue a read query, you want to read rows from as few partitions as possible. Why is this important? [Each partition may reside on a different node. The coordinator will generally need to issue separate commands to separate nodes for each partition you request. This adds a lot of overhead and increases the variation in latency. Furthermore, even on a single node, it’s more expensive to read from multiple partitions than from a single one due to the way rows are stored.]

Charmy also has a couple of pitfalls that people used to the relational database model may hit.

Related Posts

Case Classes In Scala

Shubham Dangare explains what case classes are in Scala: Case class is scale way to allow pattern matching on an object without requiring a large amount of boilerplate. All you need to do is add a single case keyword modifier to each class that you want to pattern matching using such modifier makes scala compiler […]

Read More

No Type Equivalence In M

Imke Feldmann notes an oddity in types in Power Query: But this function will not return any matches. I also tried out a (potentially) slower version using Table.SelectColumns(Types, each [Value] = x[Types]) – but still no match.  What I found particularly frustrating here was, that in some cases, these lookups or filters on type-columns worked. […]

Read More


October 2018
« Sep Nov »