Category: Architecture

Building a Data API (with POST Operations) using Data API Builder

Published 2024-07-09 by Kevin Feasel

In the previous tip on Data API Builder (DAB) for SQL Server, we discussed how REST APIs provide a secure and platform-agnostic method to share database information using REST or GRAPHQL and how DAB simplifies the process of creating data APIs without the need for extensive coding or third-party tools.

What can we do if we want POST operations? Is it possible to achieve? What other options do we have if we want to implement Data API solutions in our production environments?

Read on to learn more about how this works.

Comments closed

Atomic Design for Report Development

Published 2024-07-02 by Kevin Feasel

Kurt Buhler has an interesting approach:

Developing a good semantic model or report takes a lot of time and effort. One way to reduce this cost is by re-using parts of an existing solution for a new model or project. This modular approach is particularly valuable when a developer faces common or recurring challenges and processes. Despite this, many developers commonly repeat efforts when they start new projects, models, and reports. For example, developers will often manually recreate measures, date tables, and patterns in a new model, or spend precious hours formatting visuals in a new report, while they have already created the same or similar things in the past. One reason for this is that it is difficult to identify candidate elements to re-use, or how you can re-use them in a convenient and scalable manner.

In this article, we want to introduce a conceptual framework from UI/UX called the atomic design methodology from Brad Frost. This framework can help developers to approach Power BI models and reports in a modular way to improve productivity and consistency of a developer’s work. The purpose of this article is to introduce the concept as well as some approaches that exist to re-use parts of your model and report. In future articles and videos, we will elaborate on these and other methods in additional detail.

I like the idea a lot, but Kurt does describe some of the challenges you’ll likely need to work through to adopt it.

Comments closed

Hot and Cold Partitions for Apache Kafka Data

Published 2024-07-01 by Kevin Feasel

Gautan Goswami splits the data:

At first, data tiering was a tactic used by storage systems to reduce data storage costs. This involved grouping data that was not accessed as often into more affordable, if less effective, storage array choices. Data that has been idle for a year or more, for example, may be moved from an expensive Flash tier to a more affordable SATA disk tier. Even though they are quite costly, SSDs and flash can be categorized as high-performance storage classes. Smaller datasets that are actively used and require the maximum performance are usually stored in Flash.

Cloud data tiering has gained popularity as customers seek alternative options for tiering or archiving data to a public cloud. Public clouds presently offer a mix of object and file storage options. Object storage classes such as Amazon S3 and Azure Blob (Azure Storage) deliver significant cost efficiency and all the benefits of object storage without the complexities of setup and management.

Read on for an architecture that uses hot and cold tiers, as well as how you can set it up on an existing Kafka topic.

Comments closed

Failure Mode and Effect Analysis on Databases

Published 2024-06-27 by Kevin Feasel

Mika Sutinen thinks about how things could go wrong:

Failure Mode and Effect Analysis(FMEA) is a process of building more resilient systems, by identifying failure points in them. While it’s highly recommended to perform FMEA during the architecture design phase, it can be done at any time. More importantly, it should be reviewed periodically, and especially when the system architecture changes.

While you can do Failure Mode and Effect Analysis for whole systems, in this post, I will share an example on how to get started with FMEA for a database environment.

Read on for a description of the concept and some tips on how to perform one.

Comments closed

Choosing between Data Warehouses, Lakes, and Lakehouses

Published 2024-06-11 by Kevin Feasel

Den Smyrnov talks architecture:

Historically, the two most popular approaches to storing and managing data are Data Warehouse and Data Lake. The choice between them usually depends on business objectives and needs. While Data Lakes are ideal for preserving large volumes of diverse data, warehouses are more favorable for business intelligence and reporting. Sometimes, organizations try to have the best of both worlds and mix Data Lake & Data Warehouse architectures. This, however, can be a time and cost-consuming process.

Against this backdrop, a new hybrid approach—Data Lakehouse—has emerged. It combines features of a Data Lake and a Data Warehouse, allowing companies to store and analyze data in the same repository and eliminating the Data Warehouse vs. Data Lake dilemma. Data Lakehouse mixes the scalability and flexibility of a Data Lake with the ability to extract insights from data easily. Ever so compelling, this approach still has certain limitations. It should not be treated as a “one-size-fits-all” solution.

Read on for an explanation of each of these three styles, including their pros and cons.

Comments closed

Thoughts on Natural Keys

Published 2024-06-10 by Kevin Feasel

Mark Seemann talks keys:

Although I live in Copenhagen and mostly walk or ride my bicycle in order to get around town, I do own an old car for getting around the rest of the country. In Denmark, cars go through mandatory official inspection every other year, and I’ve been through a few of these in my life. A few years ago, the mechanic doing the inspection informed me that my car’s chassis number was incorrect.

This did make me a bit nervous, because I’d bought the car used, and I was suddenly concerned that things weren’t really as I thought. Had I unwittingly bought a stolen car?

But the mechanic just walked over to his computer in order to correct the error. That’s when a different kind of unease hit me. When you’ve programmed for some decades, you learn to foresee various typical failure modes. Since a chassis number is an obvious candidate for a natural key, I already predicted that changing the number would prove to be either impossible, or have all sorts of cascading effects, ultimately terminating in official records no longer recognizing that the car is mine.

Mark uses this as a jumping-off point on a discussion about whether to use natural keys as primary keys or whether to include surrogate keys instead. I am generally in favor of using surrogate keys in the physical data model and creating unique indexes for natural keys. But you have to use natural keys in the logical data model because surrogate keys don’t exist at the level of the logical data model. Do read the comments, though, because there’s a great debate in there.

Comments closed

Dual-Write Issues and Kafka

Published 2024-06-05 by Kevin Feasel

Wade Waldron solves a common but difficult problem:

However, the dual-write problem isn’t unique to event-driven systems or Kafka. It occurs in many situations involving different technologies and architectures.

When I started building event-driven systems, I encountered the dual-write problem almost immediately. I eventually learned effective ways to solve it but tripped over some anti-patterns along the way.

I want to break down the details of the dual-write problem so you can understand how it occurs and avoid making the same mistakes I did. I’ll outline a few anti-patterns that might look promising, but don’t solve the problem. Finally, we’ll look at accepted solutions that eliminate the dual-write problem.

Read on for a few techniques that will not work (assuming you are using Apache Kafka to flow events into some external systems) and some that will.

Comments closed

Building an ERD for Existing Databases

Published 2024-05-24 by Kevin Feasel

Josephine Bush creates a diagram:

There are several tools out there to make your life easier by creating an ERD for your existing db. Everything works pretty well when you have a small number of tables with FKs mapped, but when the number gets bigger, the diagram naturally gets a lot messier. Here are some of the ones I tried.

Click through to see the full list. I haven’t found any that are particularly good at the job, especially not in the free or relatively inexpensive tiers. My problem is that the tools tend to get goofy when you update an existing model based on database changes: all that time you spent reorganizing entity locations so you don’t have a spaghetti mess of lines criss-crossing all of your entities gets wasted the next time you perform an update, because the tools tend to shuffle things around once again.

Comments closed

Orchestration Controllers in Azure Data Factory

Published 2024-05-23 by Kevin Feasel

Martin Schoombee gets to the top of the pyramid:

Controllers are pipelines that initiate the execution of a single process or task in a specific order and with constraints. Whereas everything else in this framework is pretty automated, this part is entirely manual.

Why? Well, when I started thinking about the design of this framework I knew I needed something at the “highest level” that would execute an entire daily ETL process, or a modified ETL process that only loads specific data during the day. I wanted to maximize the flexibility of the framework, and that either meant adding another level to the metadata structure or creating this layer of pipelines that sit at the top. I opted for the second, because I did not feel it was worth the complexity of adding another layer into the metadata structure. That being said, it doesn’t mean it cannot or shouldn’t be done…it was a personal choice I made to keep things as simple as I could.

Read on to learn more about what the controller should look like and how the other pieces fit in.

Comments closed

Open Questions on Fabric Administration

Published 2024-05-23 by Kevin Feasel

Paul Andrew asks some great questions:

Microsoft Fabric is a big product with lots of different data handling capabilities. From a data engineering perspective creating and innovating with Fabric as a unified tool is a great experience, ultimately delivering data insights for the business and adding value, nice! However, as with all new developments, the creativity is the fun part. The governance and movement of code into production is less fun and can become the hard/ugly part if the change management, platform and governance aren’t mature enough.

Paul doesn’t have answers for us, though I do think many of these will eventually have answers most people find reasonable.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31