Press "Enter" to skip to content

Category: Architecture

Brownfield Data Modeling

Jared Westover discusses a common trade-off:

Some decisions in life are easy, like whether to drink that second cup of coffee. But when it comes to databases, things get complicated fast. Developers often seek my input on adding tables and columns. A common question arises: Should they create a new table or expand an existing one by adding columns? This decision can be tricky because it depends on several factors, including query performance, future growth, and the complexity of implementing either solution. While adding one or two columns to an existing table may seem the easiest option, is it the best long-term solution? In this article, we look at whether it is better to add new columns versus a new table in SQL Server.

As an architectural pro-tip, when you’re looking to add a new column to an existing table, ask yourself if the new attribute you want to add actually relates to the natural key of the existing table. In Jared’s example, the natural key for video game tracker is presumably video game ID (which itself ties back to, presumably, the video game title, developer, console, and release date) and start date. Does a book actually relate to a video game and start date? No, it does not. Therefore, this book attribute does not belong on the video game tracker table.

When you dig deeper into Boyce-Codd Normal Form, you figure out that “relates to” in the prior paragraph translates to “has a functional dependency upon,” but using non-technical language for people not familiar with normalization, you can still get to the same conclusion, because ultimately, 95% of database normalization is common sense that we strenuously apply to a business domain.

And most of the time, the developer knows that this feels weird, but doesn’t want to spend the extra time doing it the best way and instead tries to do it the expedient way. This is where the role of the architect as politician comes in, and we gently guide people to the right conclusion. Or just tell them to put on their big boy britches and do it right. Either way.

Leave a Comment

Guidance on Multi-Platform Database Design

Kellyn Gorman hits on some of the point points around designing database solutions on multiple platforms:

When building products that interact with multiple database platforms, the complexity can be both a challenge and an opportunity.  For Subject Matter Experts (SMEs), observing design decisions made without sufficient knowledge of underlying database architecture can be particularly frustrating. These moments highlight the critical need for architectural foresight and platform-specific expertise to avoid pitfalls that compromise scalability, performance, and maintainability.

Read on for a list of common problems as well as the entry point to some solutions.

Comments closed

Constraints on Polymorphic Associations in SQL Server

Jared Westover wants a foreign key, but one referencing multiple tables:

Do you like a challenge? If you answered yes, you’re my kind of person. Recently, a developer presented me with a problem: they needed a foreign key reference in one table to associate with multiple other tables. Over the years, I’ve often been asked how to make this situation work. However, achieving this relationship with foreign keys is technically impossible with SQL Server and most mainstream relational database platforms. Since SQL Server restricts foreign keys to referencing a single table, how can we solve this problem?

My immediate answer was “triggers,” which happens to be the solution Jared intentionally omits.

I’d rather go with the multiple association tables approach over multiple indicator types, as the latter requires (n-1) NULLs, where n is the number of indicator types (review types in Jared’s example) and I hate NULL because NULL is the void lying about being a value, sort of like how skim milk is water lying about being milk.

Comments closed

Concurrency Control in Oracle vs PostgreSQL

Umair Shahid continues a series on migrating from Oracle to PostgreSQL:

Transitioning from Oracle to PostgreSQL can be a transformative experience for database administrators because of the subtle differences between the two technologies. Understanding how the two handle concurrency differently is critical to managing highly concurrent workloads. 

Concurrency control is essential for maintaining data consistency when multiple users access the database simultaneously. Oracle and PostgreSQL take different approaches to concurrency control: Oracle primarily relies on locking and consistent snapshots, while PostgreSQL utilizes a Multi-Version Concurrency Control (MVCC) system.

This article provides an in-depth look at concurrency control in PostgreSQL from an Oracle perspective.

Read on for that comparison.

Comments closed

SCD Types in Microsoft Fabric

Kenneth Omorodion reminds us that the Kimball model is still quite valuable:

In modern data warehousing, how we handle updates to dimension tables is crucial. There are several approaches; but the decision often comes down to two primary strategies: Slowly Changing Dimensions (SCD) Type 2 and overwriting tables. Each has its own benefits, use cases, and trade-offs. This tip will explore the two methods and why SCD Type 2 is often a better option in many data warehouse scenarios.

Read on for this overview of the benefits of type-2 slowly changing dimensions, as well as a little bit of coverage of several other types of slowly changing dimensions.

Comments closed

A Primer on ACID Properties and Transactions

Joe Celko covers four useful properties of database transactions:

ACID is a cute acronym for AtomicityConsistencyIsolation, and Durability. This is a set of properties we want in transactions in a database. The original work was done by the late Jim Gray, but this acronym is due to Andreas Reuter and Theo Härder at IBM. These four properties as defined by Reuter and Härder are as follows:

Click through for a description of each property, as well as a brief overview of isolation levels.

Comments closed

Solid Practices for Power BI

Paul Turley has the beginnings of a new series:

It’s time for a refresher and reboot on this important topic. Much has changed in the Power BI world, the core design principles remain the same, practices and architecture patterns have evolved over the past few years. Power BI has grown up in the enterprise space and Microsoft Fabric now adds new options and capabilities. Back in 2020, I began writing a series of blog posts titled “Doing Power BI the Right Way” and it has become my mission to evolve and maintain a current collection of the most important best practice recommendations. This has been my passion and topic of several conference talks, user group sessions and a book currently in development for O’Reilly that will help you prepare for the PL-300 Power BI Analyst exam and then guide you apply enterprise best practices in your solutions.

I work with hundreds of consulting clients who go through the same cycles, having the same experiences, facing the same challenges, many making the same mistakes, and many learning some of the same lessons. The purpose of this series is to share those lessons with you.

Click through for the overview, as well as an outline of what Paul will include in this series.

Comments closed

Minimizing Latency in Kafka Streaming Applications using APIs

Abhishek Goswami doesn’t want to slow down the stream:

Kafka is widely adopted for building real-time streaming applications due to its fault tolerance, scalability, and ability to process large volumes of data. However, in general, Kafka streaming consumers work best only in an environment where they do not have to call external APIs or databases. In a situation when a Kafka consumer must make a synchronous database or API call, the latency introduced by network hops or I/O operations adds up and accumulates easily (especially when the streaming pipeline is performing an initial load of a large volume of data before starting CDC). This can significantly slow down the streaming pipeline and result in the blowing of system resources impacting the throughput of the pipeline. In extreme situations, this may even become unsustainable as Kafka consumers may not be able to commit offsets due to increased latency before the next polling call and get continuously rebalanced by the broker, practically not processing anything yet incrementally consuming more system resources as time passes.

This is a real problem faced by many streaming applications. In this article, we’ll explore some effective strategies to minimize latency in Kafka streaming applications where external API or database calls are inevitable. We’ll also compare these strategies with the alternative approach of separating out the parts of the pipeline that require these external interactions into a separate publish/subscribe-based consumer.

Read on to understand the causes of this latency and several patterns you can use to limit it.

Comments closed

A Primer on Medallion Architecture in Microsoft Fabric

Kenneth Omorodion builds a warehouse:

Data warehouses are essential components of modern analytics systems, offering optimized storage and processing capabilities for large volumes of data. When integrated with a Lakehouse architecture, you can combine the best of both worlds—structured, schema-enforced data storage with the flexibility and scalability of data lakes. Microsoft Fabric provides an excellent environment for implementing the Medallion Architecture, a design pattern for building efficient data processing pipelines by layering data into bronze, silver, and gold zones.

Click through for the process.

Comments closed

The Importance of Planning before Power BI Data Modeling

Kelly Broekstra recommends against jumping right in:

Who has been told by a manager or business person to just connect to the source data and start creating a new report? Here is my tip:

DON’T DO IT

All Power BI and Fabric reports must have a semantic model, which Microsoft describes as “a logical description of an analytical domain, with metrics, business-friendly terminology, and representation, to enable deeper analysis.” – Source

Read on to learn why and what you should instead do if you want to have a better long-term experience with Power BI.

Comments closed