Category: Architecture

When you’re designing the back end of a database, people will have all sorts of ideas.

Normalization, partitioning, referential integrity, and, usually, trying to figure out what to do when you have more than one client.

If your application is user-focused (like Stack Overflow), you don’t have to struggle too much with the idea of isolation. But when your application is geared more towards supporting multiple entities that have one or more users, things change. Sort of like how Stack Overflow manages all the other Stack Network sites.

Were you to ask me which model I prefer, it would be every tenant getting their own database. Your other options are:

Everyone all mixed in together like gen-pop

Using separate schemas inside a single database

Definitely read what Erik has to say. My prior job was a hybrid multi-tenant environment: for the main transactional database, there were several dozen SQL Server instances. Each instance had anywhere from one to a few dozen copies of the transactional database, and each database hosted one or more customers’ data. There’s not a lot of tooling out there to support that kind of strategy, so we had to build a lot of it in-house. But that said, it did work out reasonably well without having hundreds or thousands of databases on a single instance.

Comments closed

An Overview of Slowly Changing Dimensions

Published 2024-07-17 by Kevin Feasel

Reza Rad talks slowly changing dimensions:

If you want to use Power BI, Microsoft Fabric, or any other data analytics tools, one of the key concepts to understand when working with a data warehouse system is the SCD (Slowly Changing Dimension). I will do this in a series of at least two articles. The first one (this one) will be on the concept of what SCD is, its meaning, and its different types. Then, the next one will discuss how to implement SCD types (such as Type 2) using Microsoft Fabric and Power BI.

Reza focuses on SCD types 0-4 but does briefly touch on types 5-7 (of which, I’d never heard of SCD type 7).

Comments closed

Contrasting Data Mesh and Data Fabric

Published 2024-07-12 by Kevin Feasel

Sahil Babbar makes a comparison:

The concept of a data mesh proposes that each business domain takes charge of hosting, preparing, and delivering its own data to both its internal team and broader stakeholders. This decentralized approach empowers autonomous data teams to take full ownership and accountability for their data products and management processes.

Data fabric is a system designed to help a company manage and use its data from various storage types, like databases, tagged files, or document stores. It supports different tools and applications to easily access this data, working with technologies like Apache Kafka for real-time data streaming, ODBC for database connections, HDFS for big data storage and REST APIs for web services. It focuses on creating a unified data environment that acts as a reliable, centralized source for all organizational data. This setup ensures data is accurate, consistent, and secure, making it easy for different teams to access and manage data efficiently.

Read on to learn a bit more about the two architectures.

Comments closed

Microsoft Healthcare Accelerator for Fabric

Published 2024-07-10 by Kevin Feasel

Tino Zishiri takes us through an accelerator solution:

Microsoft released the Healthcare Data Solutions in Microsoft Fabric in Q1 2024. It was introduced as a “A game-changer for healthcare data analysis” by Umesh Rustogi, General Manager of Microsoft Health and Life Sciences Data Platform.

Microsoft Fabric is a unified platform that bundles services, apps, and connectors under a single umbrella, providing users with the tooling to meet all data and analytics needs.

The Healthcare Data Solutions are built on top of this robust service offering. The solution is aimed at users who are looking for a powerful tool to integrate and transform Healthcare data. In addition, users can run real-time analytics, data science workloads and meet business intelligence needs without compromising the privacy and security of their data.

Click through to learn more about how this works for defining an industry-standard architectural pattern.

Comments closed

Building a Data API (with POST Operations) using Data API Builder

Published 2024-07-09 by Kevin Feasel

Eduardo Pivaral digs into DAB:

In the previous tip on Data API Builder (DAB) for SQL Server, we discussed how REST APIs provide a secure and platform-agnostic method to share database information using REST or GRAPHQL and how DAB simplifies the process of creating data APIs without the need for extensive coding or third-party tools.

What can we do if we want POST operations? Is it possible to achieve? What other options do we have if we want to implement Data API solutions in our production environments?

Read on to learn more about how this works.

Comments closed

Atomic Design for Report Development

Published 2024-07-02 by Kevin Feasel

Kurt Buhler has an interesting approach:

Developing a good semantic model or report takes a lot of time and effort. One way to reduce this cost is by re-using parts of an existing solution for a new model or project. This modular approach is particularly valuable when a developer faces common or recurring challenges and processes. Despite this, many developers commonly repeat efforts when they start new projects, models, and reports. For example, developers will often manually recreate measures, date tables, and patterns in a new model, or spend precious hours formatting visuals in a new report, while they have already created the same or similar things in the past. One reason for this is that it is difficult to identify candidate elements to re-use, or how you can re-use them in a convenient and scalable manner.

In this article, we want to introduce a conceptual framework from UI/UX called the atomic design methodology from Brad Frost. This framework can help developers to approach Power BI models and reports in a modular way to improve productivity and consistency of a developer’s work. The purpose of this article is to introduce the concept as well as some approaches that exist to re-use parts of your model and report. In future articles and videos, we will elaborate on these and other methods in additional detail.

I like the idea a lot, but Kurt does describe some of the challenges you’ll likely need to work through to adopt it.

Comments closed

Hot and Cold Partitions for Apache Kafka Data

Published 2024-07-01 by Kevin Feasel

Gautan Goswami splits the data:

At first, data tiering was a tactic used by storage systems to reduce data storage costs. This involved grouping data that was not accessed as often into more affordable, if less effective, storage array choices. Data that has been idle for a year or more, for example, may be moved from an expensive Flash tier to a more affordable SATA disk tier. Even though they are quite costly, SSDs and flash can be categorized as high-performance storage classes. Smaller datasets that are actively used and require the maximum performance are usually stored in Flash.

Cloud data tiering has gained popularity as customers seek alternative options for tiering or archiving data to a public cloud. Public clouds presently offer a mix of object and file storage options. Object storage classes such as Amazon S3 and Azure Blob (Azure Storage) deliver significant cost efficiency and all the benefits of object storage without the complexities of setup and management.

Read on for an architecture that uses hot and cold tiers, as well as how you can set it up on an existing Kafka topic.

Comments closed

Failure Mode and Effect Analysis on Databases

Published 2024-06-27 by Kevin Feasel

Mika Sutinen thinks about how things could go wrong:

Failure Mode and Effect Analysis(FMEA) is a process of building more resilient systems, by identifying failure points in them. While it’s highly recommended to perform FMEA during the architecture design phase, it can be done at any time. More importantly, it should be reviewed periodically, and especially when the system architecture changes.

While you can do Failure Mode and Effect Analysis for whole systems, in this post, I will share an example on how to get started with FMEA for a database environment.

Read on for a description of the concept and some tips on how to perform one.

Comments closed

Choosing between Data Warehouses, Lakes, and Lakehouses

Published 2024-06-11 by Kevin Feasel

Den Smyrnov talks architecture:

Historically, the two most popular approaches to storing and managing data are Data Warehouse and Data Lake. The choice between them usually depends on business objectives and needs. While Data Lakes are ideal for preserving large volumes of diverse data, warehouses are more favorable for business intelligence and reporting. Sometimes, organizations try to have the best of both worlds and mix Data Lake & Data Warehouse architectures. This, however, can be a time and cost-consuming process.

Against this backdrop, a new hybrid approach—Data Lakehouse—has emerged. It combines features of a Data Lake and a Data Warehouse, allowing companies to store and analyze data in the same repository and eliminating the Data Warehouse vs. Data Lake dilemma. Data Lakehouse mixes the scalability and flexibility of a Data Lake with the ability to extract insights from data easily. Ever so compelling, this approach still has certain limitations. It should not be treated as a “one-size-fits-all” solution.

Read on for an explanation of each of these three styles, including their pros and cons.

Comments closed

Thoughts on Natural Keys

Published 2024-06-10 by Kevin Feasel

Mark Seemann talks keys:

Although I live in Copenhagen and mostly walk or ride my bicycle in order to get around town, I do own an old car for getting around the rest of the country. In Denmark, cars go through mandatory official inspection every other year, and I’ve been through a few of these in my life. A few years ago, the mechanic doing the inspection informed me that my car’s chassis number was incorrect.

This did make me a bit nervous, because I’d bought the car used, and I was suddenly concerned that things weren’t really as I thought. Had I unwittingly bought a stolen car?

But the mechanic just walked over to his computer in order to correct the error. That’s when a different kind of unease hit me. When you’ve programmed for some decades, you learn to foresee various typical failure modes. Since a chassis number is an obvious candidate for a natural key, I already predicted that changing the number would prove to be either impossible, or have all sorts of cascading effects, ultimately terminating in official records no longer recognizing that the car is mine.

Mark uses this as a jumping-off point on a discussion about whether to use natural keys as primary keys or whether to include surrogate keys instead. I am generally in favor of using surrogate keys in the physical data model and creating unique indexes for natural keys. But you have to use natural keys in the logical data model because surrogate keys don’t exist at the level of the logical data model. Do read the comments, though, because there’s a great debate in there.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31