Press "Enter" to skip to content

Category: Architecture

The Importance of Star Schemas in Power BI

Paul Turley lays out facts (and dimensions):

There is no secret about this. If you do any legitimate research about Power BI (reading blogs, books or training from reliable sources), you will quickly learn that a lot of basic functionality requires a dimensional model, aka “Star Schema”. This is a hard fact that every expert promotes, and self-taught data analysts either have learned or will learn through experience. So, if everyone agrees on this point, why do so many resist this advice?

Perspective is everything. I didn’t understand why getting to the star schema was so out of reach so often until I was able to see it from another perspective. There are a few common scenarios that draw source data into different directions than an ideal dimensional model.

Read on for Paul’s take on the subject.

Comments closed

Reviewing Database Usage Trends

Brendan Tierney looks at the data:

Getting back to the topic of this post, I’ve gathered some data and obtained some league tables from some sites. These will help to have a closer look at what is really happening in the Database market throughout 2022. Two popular sites who constantly monitor the wider internet and judge how popular Databases area globally. These sites are DB-Engines and TOPDB Top Database index. These are well know and are frequently cited. Both of these sites give some details of how they calculate their scores, with one focused mainly on how common the Database appears in searches across different search engines, while the other one, in addition to search engine results/searches, also looks across different websites, discussion forms, social media, job vacancies, etc.

I don’t necessarily believe that these are totally accurate, though on the whole, I do expect the results to be directionally accurate. I’ve used DB-Engines data several times in the past and like to point out that, for any given year, 7 or 8 of the top 10 database engines are relational.

Comments closed

Consistency Levels in Cassandra

Dmytro Kostenko enumerates some options:

In Cassandra, a consistency level is the number of replicas responding before returning a reply to a user. Consistency in Cassandra is tunable, meaning that each client can consider what level of consistency and availability to choose. Moreover, it is assigned at the query level and can be configured for different service components. Users can choose different consistency levels for each operation, both for reads and writes. While choosing the consistency level for your operation, you should understand each level’s tradeoff between consistency and availability. Cassandra’s consistency can be strong or weak, depending on your chosen level.

Read on to learn more about strong vs weak consistency in the context of Cassandra, as well as the consistency level options available to us.

Comments closed

Use Cases for Multiple Data Lakes

James Serra explains why you might want multiple data lakes in an organization:

A question I get asked frequently from customers when discussing Data lake architecture is “Should I use one data lake for all my data, or multiple lakes?”. Ideally, you would use just one data lake, but I have seen many valid use cases where customers are using multiple data lakes. Here are some of those reasons:

I’d quibble with a couple of these (and given James’s intro, I’m not sure he’s fully on board with all of the reasons) but this is a good list of reasons why you might see several data lakes in an organization.

Comments closed

Well-Architected Framework for Oracle in Azure

Kellyn Pot’vin-Gorman has a new tool for us:

This invaluable framework provides clear guidance on the recommended practices to assess, architect and migrate Oracle workloads to the Azure cloud.  This should be the first place for answers to success for Oracle on Azure!

A special thanks to my teammate, Jessica Haessler for working so hard to help me get this to the finish line, as I would have never been able to get this done on my own!  

Click through for a link to the guide. There isn’t a Well-Architected Framework assessment for this yet but the WAF articles themselves have quite a bit of detail to them.

Comments closed

Storing Semi-Additive Facts as Timespans

Timo Zishiri gives a new spin to a common warehousing problem:

In these cases, the measure may be aggregated across dates by averaging over the number of periods, e.g., average daily inventory levels. Measures can also be aggregated across dates by taking the maximum/minimum for the time interval.

More specifically, this blog focuses on an alternative approach to providing end users with the ability to do point-in-time analysis, so-called trend analysis.

Click through to see how a timespan table would work.

Comments closed

The Importance of Proper Data Modeling in Power BI

Paul Turley avoids “big, wide tables”:

Power BI is architected to consume data in a dimensional model, with narrow fact tables and related dimensions. Introducing a big, wide table in a tabular model is extremely inefficient. It takes up space and memory resources, impacts performance, and complicates measure coding. Flattening records into a flat table is one of the worst things you can do in Power BI and a common mistake made by novice Power BI users.

This is a conversation I’ve had with many customers. We want our cake, and we want to eat it too. We want to have all the analytic capabilities, interactivity and high performance but we also want the ability to drill-down to a lot of details. What if we have a legitimate need to report on transaction details and/or a large table with many columns? It is well-known that the ideal shape is a star schema but what if we need to shape data for detail reporting? The answer is that you can have it both ways, but just not in one table.

Read on for a better model design (hint: the Kimball style) as well as several tips and tricks.

Comments closed

Automating Archive Table Creation

Aaron Bertrand doesn’t want to archive things by himself every month:

Earlier in this series (part 1 | part 2), I wrote at a high level about how to solve issues with ever-growing log tables without large delete operations or data movement to a secondary archive table. In this tip, I’ll share a few code snippets you can use to automate the generation of objects to help make these solutions hands-free.

Read on for the tips.

Comments closed

Tips for Large Table Data Archival

Aaron Bertrand follows up on a prior post:

As soon as you realize your growth rates are higher than expected, you need to plan to buy or allocate more disk space. There is no way around this—more data means more disk. You can delay the inevitable for a little bit with better compression, but this is not a long-term fix, and it can impact query performance in different ways (trading CPU for I/O).

Once more disk is in place, you can plan your growth better.

Click through for some guidance on how to plan that growth.

Comments closed

Archival Tables in SQL Server

Aaron Bertrand starts a new series:

We all have one: the table that grows forever. Maybe it contains chat messages, post comments, or simple web traffic. Eventually, the table gets large enough that it becomes problematic – for example, users will notice that searches or updates take longer and longer as this massive, ever-growing table is scanned.

People often deal with this by archiving older data into a separate table. In this tip series, I’ll describe an archive table, explain why that solution carries its own set of problems, and show other potential ways to deal with data that grows indefinitely.

This is where we say, “Ah, if only Stretch DB had been priced approximately 1/100th of what it really was.” Stretch DB also had its own problems—especially if you ever needed to change the large table’s schema—but stay tuned for Aaron’s answers.

Comments closed