Choosing A Data Platform

Lukas Eder discusses when to use a relational database versus some non-relational database:

This question obviously assumes that you’re starting out with an RDBMS, which is classically the database system that solves pretty much any problem decently enough not to be replaced easily. What does this mean? Simply put:

  • RDBMS have been around forever, so they have a huge advantage compared to “newcomers” in the market, who don’t have all the excellent tooling, community, support, maturity yet
  • E.F. Codd’s work may have been the single biggest influence on our whole industry. There has hardly been anything as revolutionary as the relational model ever since. It’s hard for an alternative database to be equally universal, i.e. they’re mostly solving niche problems

Having said so, sometimes you do have a niche problem. For instance a graph database problem. In fact, a graph is nothing fundamentally different from what you can represent in the relational model. It is easy to model a graph with a many-to-many relationship table.

If you want a checklist, here’s how I would approach this question (ceteris paribus and limiting myself to about 100 words):

  1. Are you dealing with streaming millions of rows per second, or streaming from tens of thousands of endpoints concurrently?  Kafka and the Hadoop streaming stack.
  2. Is your problem something that you’ve already solved with a relational database, and your solution works well enough?  Relational database.
  3. Are there multiple “paths” to get to interesting data?  Relational database.
  4. Shopping carts (write-heavy, focused on availability over consistency)?  Riak/Cassandra/Dynamo at large scale, else relational database.
  5. Type duplication?  Relational database.
  6. Petabytes of data being analyzed asynchronously?  Hadoop.
  7. Other data platforms if they fit specific niche requirements around data structure.

There’s a lot more to this discussion than a simple numbered list, but I think it’s reasonable to start with relational databases and move away if and only if there’s a compelling reason.

Related Posts

Aim For Database-First Designs

Lukas Eder explains why database-first design patterns tend to work better than code-first design: The real “truth” of your database schema, and the “sovereignty” over it, resides with your database. The database is the only place where the schema is defined, and all clients have a copy of the database schema, not vice versa. The […]

Read More

A Non-Relational Database Taxonomy

Thomas Henson has a taxonomy of non-relational databases: Columnar Database The first type of NoSQL database is the Columnar databases which is optimized for reading and writing columns of data as opposed to rows of data. Column-oriented storage for database tables is an help drive down the input/output requirements for database. Since the I/O profile […]

Read More


September 2016
« Aug Oct »