Choosing A Data Platform

Lukas Eder discusses when to use a relational database versus some non-relational database:

This question obviously assumes that you’re starting out with an RDBMS, which is classically the database system that solves pretty much any problem decently enough not to be replaced easily. What does this mean? Simply put:

  • RDBMS have been around forever, so they have a huge advantage compared to “newcomers” in the market, who don’t have all the excellent tooling, community, support, maturity yet
  • E.F. Codd’s work may have been the single biggest influence on our whole industry. There has hardly been anything as revolutionary as the relational model ever since. It’s hard for an alternative database to be equally universal, i.e. they’re mostly solving niche problems

Having said so, sometimes you do have a niche problem. For instance a graph database problem. In fact, a graph is nothing fundamentally different from what you can represent in the relational model. It is easy to model a graph with a many-to-many relationship table.

If you want a checklist, here’s how I would approach this question (ceteris paribus and limiting myself to about 100 words):

  1. Are you dealing with streaming millions of rows per second, or streaming from tens of thousands of endpoints concurrently?  Kafka and the Hadoop streaming stack.
  2. Is your problem something that you’ve already solved with a relational database, and your solution works well enough?  Relational database.
  3. Are there multiple “paths” to get to interesting data?  Relational database.
  4. Shopping carts (write-heavy, focused on availability over consistency)?  Riak/Cassandra/Dynamo at large scale, else relational database.
  5. Type duplication?  Relational database.
  6. Petabytes of data being analyzed asynchronously?  Hadoop.
  7. Other data platforms if they fit specific niche requirements around data structure.

There’s a lot more to this discussion than a simple numbered list, but I think it’s reasonable to start with relational databases and move away if and only if there’s a compelling reason.

Related Posts

Review: dbForge Studio For Database Modeling

Randolph West is looking for a product for database modeling and tries out dbForge Studio: These days I still design new databases from scratch with pen and paper (or iPad and Apple Pencil), where the entity relationship diagram (ERD) is rudimentary and crows’ feet relationships are badly-scrawled. But it got me wondering which database modelling […]

Read More

The Forgotten Infrastructure Below Azure BI Architecture Diagrams

Meagan Longoria reminds us that there are several products which Azure BI projects need but which we tend to forget when building architectural diagrams: Let’s start with Azure Active Directory (AAD). In order to provision the resources in the diagram, your Azure subscription must already be associated with an Active Directory. AAD is Microsoft’s cloud-based […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930