Data Modeling In Cassandra

Charmy Garg walks us through some of the basics of modeling tables in Cassandra:

Two basic goals in Cassandra which we should keep in mind:

  • Spread data evenly around the cluster – You want every node in the cluster to have roughly the same amount of data. Rows are spread around the cluster based on a hash of the partition key, which is the first element of the PRIMARY KEY. So, the key to spreading data evenly is this: pick a good primary key.

  • Minimize the number of partitions read – Partitions are groups of rows that share the same partition key. When you issue a read query, you want to read rows from as few partitions as possible. Why is this important? [Each partition may reside on a different node. The coordinator will generally need to issue separate commands to separate nodes for each partition you request. This adds a lot of overhead and increases the variation in latency. Furthermore, even on a single node, it’s more expensive to read from multiple partitions than from a single one due to the way rows are stored.]

Charmy also has a couple of pitfalls that people used to the relational database model may hit.

Related Posts

Parsing Rows Manually with Spark .NET

Ed Elliott shows how we can solve a challenging problem when newlines are in the wrong place: So the first thing we need to do is to read in the whole file in one chunk, if we just do a standard read the file will get broken into rows based on the newline character: var […]

Read More

SQL Server CTP 3.2 and Java Extensibility

Niels Berglund walks us through what has changed with Java support in ML Services in SQL Server 2019 CTP 3.2: One of the announcements of what is new in CTP 3.2 was that SQL Server now includes Azul System’sZulu Embedded right out of the box for all scenarios where we use Java in SQL Server, including Java extensibility. […]

Read More

Categories

October 2018
MTWTFSS
« Sep Nov »
1234567
891011121314
15161718192021
22232425262728
293031