Press "Enter" to skip to content

Category: Architecture

Natural Keys?

Steve Jones wonders if we should give up on natural primary key constraints:

One of the things I think is important in modeling your particular entity is including a primary key (PK). In my DevOps talk I stress this, as I’d rather most attendees come away thinking a PK is important as their first takeaway from the session. There are exceptions, but they are rare, and I would prefer that most tables just have some PK included from the beginning.

A PK ought to be stable as well, and there are plenty of written words about how to pick the PK for your particular problem domain. Often I have received the advice that natural keys are preferred over surrogate keys, and it is worth the effort to try and identify a suitable column (or set of columns) that will guarantee uniqueness. I think that’s good advice, and it’s also advice I tend to ignore.

Read on for Steve’s reasoning.  I tend to use surrogate keys out of habit, though I do prefer to put unique key constraints on natural keys to help me reason through data models.

Comments closed

Building Observable Distributed Systems

Kevin Sookocheff has some thoughts on building observable systems:

Given the shortcomings of monitoring and testing, we should shift focus to building observable systems. This means treating observability of system behaviour as a primary feature of the system being built, and integrating this feature into how we design, build, test, and maintain our systems. This also means acknowledging that the ease with which we can debug our production environment will be a key indicator of system reliability, scalability, and ultimately customer experience. Designing a system to be observable requires effort from three disciplines of software development: development, testing, and operations. None of these disciplines is more important than the others, and the sum of them is greater than the value of the individual parts. Let’s take some time to look at each discipline in more detail, with a focus on observability.

My struggle has never been with the concept, but rather with getting the implementation details right.  “Make everything observable” is great until you run out of disk space because you’re logging everything.

Comments closed

The Basics Of Kubernetes

Chris Adkin shares some thoughts on what Kubernetes is and why it might be interesting to data platform professionals:

I strongly urge anyone with an interest in learning Kubernetes to watch this presentation, as it makes a great job of explaining Kubernetes from the ground up.

“Out of the box”, Kubernetes will look after scheduling. If there is a requirement to ensure that pods only run on a specific set of nodes, there is a means of doing this via label selectors, as documented here. A label selector is a directive of influencing resource utilization related decisions made by the cluster.

Read the whole thing.

Comments closed

Using Kubernetes To Support Microservices

Samir Behara walks us through a high-level explanation of how you can use Kubernetes to support development of microservices:

Kubernetes is an open source container-orchestration system for automating deployments, scaling and management of containerized applications. In this tutorial, you will learn how to get started with Microservices on Kubernetes. I will cover the below topics in details —

  • How does Kubernetes help to build scalable Microservices?

  • Overview of Kubernetes Architecture

  • Create a Local Development Environment for Kubernetes using Minikube

  • Create a Kubernetes Cluster and deploy your Microservices on Kubernetes

  • Automate your Kubernetes Environment Setup

  • CI/CD Pipeline for deploying containerized application to Kubernetes

This sticks mostly to a high-level architecture discussion, and does a good job at that.

Comments closed

Integrating Kafka Into A Data Scientist’s Workflow

Liz Bennett from Stitch Fix has a guest post on the Confluent blog:

Our main requirement for this new project was to build infrastructure that would be 100 percent self-service for our Data Scientists. In other words, my teammates and I would never be directly involved in the discovery, creation, configuration and management of the event data. Self-service would fix the primary shortcoming of our legacy event delivery system: manual administration that was performed by my team whenever a new dataset was born. This manual process hindered the productivity and access to event data for our Data Scientists. Meanwhile, fulfilling the requests of the Data Scientists hindered our own ability to improve the infrastructure. This scenario is exactly what the Data Platform Team strives to avoid. Building self-service tooling is the number one tenet of the Data Platform Team at Stitch Fix, so whatever we built to replace the old event infrastructure needed to be self-service for our Data Scientists. You can learn more about our philosophy in Jeff Magnusson’s post Engineers Shouldn’t Write ETL.

This is an architectural overview and a good read.

Comments closed

Rant: NoSQL Isn’t A Thing

Grant Fritchey is in rare form today:

Go and search through it for a NoSQL data management system. I’ll wait.

You found one that was named NoSQL didn’t you. Oracle NoSQL, because Oracle. Of course. However, under the Database Model what did it say? Document Store. Why?

BECAUSE NOSQL IS NOT A DATA STORAGE ENGINE!

There is not a NoSQL thing to use. You can’t compare NoSQL to anything because NoSQL is just a different screed, a rant, a tantrum. A bunch of influential developers threw themselves on the floor, screaming and kicking, spittle flying everywhere, because they didn’t want to eat their broccoli.

Wait, that was my kids when they were three.

As the voice of reason (and you know you’re in trouble when that’s the case!), non-relational engines take up 6 of the top 15 slots.  There are particular advantages to non-relational data stores—as Grant willingly notes—and as companies grow, specialized data storage can become quite useful.  But relational databases are a great starting point and for good reason:  they solve a subset of problems extremely well and solve most problems well enough.  This is probably also a good place to drop in a reference to Feasel’s Law.

Comments closed

In Defense Of The Layered Architecture

Sylvia Fronczak defends the layered architecture approach to software development:

There are disadvantages to the layered architecture approach. And some aspects of it have been deemed not architecturally pure for domain-driven design.

However, there are disadvantages to most approaches. The key is to understand both the advantages and disadvantages. Pick what architectural patterns work best for your particular problem.

Here are some factors you want to keep in mind.

  1. The layered architecture is very database-centric. As mentioned before, since everything flows down, it’s often the database that’s the last layer. Critics of this architecture point out that your application isn’t about storing data; it’s about solving a business problem. However, when so many applications are simple CRUD apps, maybe the database is more than just a secondary player.

  2. Scaleability can be difficult with a layered architecture. This is tied to the fact that many layered applications tend to take on monolithic properties. If you need to scale your app, you have to scale the whole app! However, that doesn’t mean your layered application has to be a monolith. Once it becomes large enough, it’s time to split it out—just like you would with any other architecture.

  3. A layered application is harder to evolve, as changes in requirements will often touch all layers.

  4. A layered architecture is deployed as a whole. That’s even if it’s modular and separated into good components and namespaces. But that might not be a bad thing. Unless you have separate teams working on different parts of the application, deploying all at once isn’t the worst thing you can do.

I completely disagree with critique #1 (as opposed to Sylvia’s defense):  most of the time, the database will outlive that puny application by a matter of decades.  That is, when some team is developing version 8 of the application, they’ll use essentially the same database as in V1.  Creating a design which optimizes for data storage and recall is, in general, much more important than which Javascript library you’re using or how you lay out those files.

Comments closed

Building Resilient Microservices

Samir Behara has some tips for designing resilient microservices:

While architecting distributed cloud applications, you should assume that failures will happen and design your applications for resiliency. A Microservice ecosystem is going to fail at some point or the other and hence you need to learn embracing failures. Don’t design systems with the assumption that its going to be sunny throughout the year. Be realistic and account for the chances of having rain, snow, thunderstorms and other adverse conditions. In short, design your microservices with failure in mind. Things don’t go as per plan always and you need to be prepared for the worst case scenario.

If Service A calls Service B which in turn calls Service C, what happens when Service B is down? What is your fallback plan in such a scenario?

  • Can you return a pre-decided error message to the user?

  • Can you call another service to fetch the information?

  • Can you return values from cache instead?

  • Can you return a default value?

Microservices have their drawbacks, but one big advantage is that they tend to be concise enough that you can reason about them more clearly than kitchen sink applications.  Planning ahead on potential failure modalities differentiates flaky services from robust services.

Comments closed

Supertype-Subtype Relationships In Tables

Deborah Melkin explains the supertype-subtype pattern in relational database architecture:

You don’t see a supertype-subtype relationship defined as such when you’re looking at the physical database. You’ll only see it explicitly in the logical data model. So what is the pattern and how do you know that you have one in your database?

This relationship exists where you have one entity that could have different attributes based on a discriminator type. One example is a person. Depending on the role of that person in relationship to the business, you will need to store different pieces of information for them. You need different information about a client than you do an employee. But you’re dealing with a person so there is shared information.

It’s a good pattern for minimizing data repetition.

Comments closed

Thoughts On GraphQL In Existing Systems

Anshulee Asthana shares some thoughts after researching GraphQL implementation details:

The Existing System

  • SQL Server DB with SPs, functions, views, and queries. SPs are non-modular in the sense they join various tables and return values specific to the calls being made.
  • Connection with the DB is using basic ADO.NET and not with EF/LINQ, etc.
  • ASP.NET WebAPI2.0 (Not .NET Core) and Java-based APIs.
  • Native Android, iOS and Web Clients. Some portion of the web clients bypasses the API and talks to the DB directly.
  • WebClients: ASP.NET and React.
  • A React Native-based app.

System Maturity

  • System has been in production for few years now and is stable.
  • As the system is in a competitive space, new features are always getting added apart from usual maintenance.

Customer Ask

  • Whether it makes sense to wrap our existing APIs into a GraphQL Endpoint.

  • For a new feature in the react app evaluate making the new .NET based APIs in GraphQL.

It’s nice that Anshulee shared this case study, especially because there’s relatively little involving GraphQL + .NET.

Comments closed