Data Modeling In Cassandra

Charmy Garg walks us through some of the basics of modeling tables in Cassandra:

Two basic goals in Cassandra which we should keep in mind:

  • Spread data evenly around the cluster – You want every node in the cluster to have roughly the same amount of data. Rows are spread around the cluster based on a hash of the partition key, which is the first element of the PRIMARY KEY. So, the key to spreading data evenly is this: pick a good primary key.

  • Minimize the number of partitions read – Partitions are groups of rows that share the same partition key. When you issue a read query, you want to read rows from as few partitions as possible. Why is this important? [Each partition may reside on a different node. The coordinator will generally need to issue separate commands to separate nodes for each partition you request. This adds a lot of overhead and increases the variation in latency. Furthermore, even on a single node, it’s more expensive to read from multiple partitions than from a single one due to the way rows are stored.]

Charmy also has a couple of pitfalls that people used to the relational database model may hit.

Related Posts

Using the ML.NET Model Builder

I have a post looking at the ML.NET Model Builder: You have four options from which to choose: two-class classification, multi-class classification, regression, or Choose Your Own Adventure. Today, we’re going to create a two-class classification model. Incidentally, they’re not kidding about things changing in preview—last time I looked at this, they didn’t have multi-class […]

Read More

Creating Models with ML.NET

I have a series on ML.NET; in this post, I look at building a model: Okay, now that I have classes, I need to put in that lambda. I guess the lambda could change to qb => qb.Quarterback == "Josh Allen" ? "Josh Allen" : "Nate Barkerson" and that’d work except for one itsy-bitsy thing: if I […]

Read More


October 2018
« Sep Nov »