Architecture – Page 4

The Importance of Planning before Power BI Data Modeling

Published 2024-10-23 by Kevin Feasel

Kelly Broekstra recommends against jumping right in:

Who has been told by a manager or business person to just connect to the source data and start creating a new report? Here is my tip:

DON’T DO IT

All Power BI and Fabric reports must have a semantic model, which Microsoft describes as “a logical description of an analytical domain, with metrics, business-friendly terminology, and representation, to enable deeper analysis.” – Source

Read on to learn why and what you should instead do if you want to have a better long-term experience with Power BI.

Comments closed

Tips for Adopting Microsoft Fabric

Published 2024-10-18 by Kevin Feasel

Paul Turley shares some thoughts:

Hello, friends. I’ve spent the past few months working with several new Fabric customers who were seeking guidance and recommendations for Fabric architecture decisions. What have we learned about using Fabric in enterprise data settings in the past 11 months? This post covers some of the important decisions points and Fabric solution design patterns.

Much of the industry’s experience with Microsoft Fabric over the past several months has been at a high-level as organizations were dipping their toe in the pool to test the water. So far, our Data & AI team have assisted around 50 clients with Fabric projects of various sizes. We have also implemented a handful of production scale projects with enterprise workloads, comparing notes with community leaders and the product teams who develop the product. What lessons have we learned?

Click through for several bits of high-level architectural guidance intended to make that adoption easier.

Comments closed

Tips for Orchestrating Fabric Notebooks

Published 2024-10-14 by Kevin Feasel

Stepan Resl talks orchestration:

Let’s start by introducing what orchestration is and why it’s important to talk about shared resources. Orchestration is a discipline focused on managing and coordinating individual items or control elements to collectively manage the flow of our data operations. In the context of Fabric, this involves managing notebooks, dataflows, pipelines, stored procedures, semantic model updates, and many other items, activities, and services that may even be outside of Fabric.

Read on for some of the options, how they work in Microsoft Fabric, and tips for success.

Comments closed

Tablespaces in Oracle and PostgreSQL

Published 2024-10-07 by Kevin Feasel

Umair Shahid explains how tablespaces work in Oracle and PostgreSQL:

Tablespaces play an important role in database management systems, as they determine where and how database objects like tables and indexes are stored. Both Oracle and PostgreSQL have the concept of tablespaces, but they implement them differently based on the overall architecture of each database.

Oracle’s tablespaces are an integral part of the database that provide various functionalities, including separating data types, managing storage, and optimizing performance. PostgreSQL, on the other hand, takes a more simplified approach, using tablespaces primarily to control where physical files are stored.

This blog aims to provide a comprehensive comparison between Oracle and PostgreSQL tablespaces, covering their architecture, creation, and practical use cases, with the goal of helping DBAs better understand their capabilities and limitations

Read on to learn more about how tablespaces work in each platform and how they differ.

Comments closed

When to Think about Scalability

Published 2024-09-19 by Kevin Feasel

Steve Jones hits upon a dilemma:

There may not be a large workload in production either, at least not at first.

So, what do you worry about first: your code being used or performing well? That’s a similar question to this one: Worry about Scalability or Popularity First? While most of us don’t work for a startup and our organizations have some sort of financial stability, does popularity matter?

I don’t always have a solid answer for this. The closest I have is to try to make my baseline:

Easy for maintainers (including myself) to read
Reasonably efficient
Capable of some level of scale but not necessarily the most scalable

If you want practical terms, I create a somewhat-educated guess on approximately how many rows there will be in a steady state after launch and then multiply that by a factor of 2-3 when generating test data. If you can dodge 2-3x expectations, you can dodge a ball.

And if you suddenly balloon to 10x, you grouse and grumble and spend a couple of sprints digging out from the mess of success.

Comments closed

A Primer on Database Sharding

Published 2024-09-13 by Kevin Feasel

Adrien Payong covers one technique to scale out databases:

Companies of all sizes and across industries are struggling to cope with an explosion of data never before seen in the short history of computing. As applications reach new levels of sophistication and become deeply interconnected, these companies find themselves increasingly overworked, overheated, and at their wits’ end, desperately trying to squeeze just a bit more performance and availability out of their aging database architectures.

Enter sharding, a powerful database architecture pattern that offers a solution to these challenges. Sharding scales out databases as data volume and user load grow, providing performance and high availability by spreading a database’s data across multiple servers.

Read on to learn more about it. Adrien mentions MongoDB, Cassandra, MySQL, and Postgres, though the real trick of sharding is in the client, so it also works for other data platform technologies as well, including SQL Server.

Comments closed

Discerning a Star Schema from an Existing Report

Published 2024-09-10 by Kevin Feasel

Kelly Broekstra describes a common flow for business intelligence projects:

I have worked as a business intelligence developer for several years, and I’m always asked: “How do you convert user requirements to a functioning data model?”

I follow the Kimball methodology. For more information, check out the official pages.

But, here are some specific tips on what works for me.

Click through for those tips.

Comments closed

Real-Time Streaming in Azure

Published 2024-09-06 by Kevin Feasel

Temidayo Omoniyi takes us through an architecture:

In today’s world, billions of data are generated daily from messaging applications like WhatsApp, financial data like the New York Stock Exchange, or video streaming platforms like YouTube. As a data engineer or solution architect, you are tasked to design a real-time streaming platform that captures the data as they are generated and stored in the necessary storage for decision-making.

This does a great job of going into detail, not only at the architectural level, but also setup and practical implementation.

Comments closed

Query Re-Optimization in Postgres

Published 2024-08-20 by Kevin Feasel

Andrei Lepikhov walks through an interesting scenario:

What was the impetus to begin this work? It was caused by many real cases that may be demonstrated clearly by the Join Order Benchmark. How much performance do you think Postgres loses if you change its preference of employing parallel workers from one to zero? Two times regression? What about 10 or 100 times slower?

The black line in the graph below shows the change in execution time of each query between two cases: with parallel workers disabled and with a single parallel worker per gather allowed. For details, see the test script and EXPLAINs, with and without parallel workers.

Click through for an overview of what Andrei wrote, including architectural notes. But stick around until the end to see just how difficult the challenge is to re-optimize without making performance worse in the end.

Comments closed

A Reference Architecture for Microsoft Fabric

Published 2024-08-15 by Kevin Feasel

James Serra draws boxes:

Microsoft Fabric uses a data lakehouse architecture, which means it does not use a relational data warehouse (with its relational engine and relational storage) and instead uses only a data lake to store data. Data is stored in Delta lake format so that the data lake acquires relational data warehouse-like features (check out my book that goes into much detail on this, or my video). Here is what a typical architecture looks like when using Fabric (click here for the .vsd):

Click through for the image as well as James’s explanation of the components.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Category: Architecture

DON’T DO IT