Press "Enter" to skip to content

Category: Architecture

Consistency Levels in Cassandra

Dmytro Kostenko enumerates some options:

In Cassandra, a consistency level is the number of replicas responding before returning a reply to a user. Consistency in Cassandra is tunable, meaning that each client can consider what level of consistency and availability to choose. Moreover, it is assigned at the query level and can be configured for different service components. Users can choose different consistency levels for each operation, both for reads and writes. While choosing the consistency level for your operation, you should understand each level’s tradeoff between consistency and availability. Cassandra’s consistency can be strong or weak, depending on your chosen level.

Read on to learn more about strong vs weak consistency in the context of Cassandra, as well as the consistency level options available to us.

Comments closed

Use Cases for Multiple Data Lakes

James Serra explains why you might want multiple data lakes in an organization:

A question I get asked frequently from customers when discussing Data lake architecture is “Should I use one data lake for all my data, or multiple lakes?”. Ideally, you would use just one data lake, but I have seen many valid use cases where customers are using multiple data lakes. Here are some of those reasons:

I’d quibble with a couple of these (and given James’s intro, I’m not sure he’s fully on board with all of the reasons) but this is a good list of reasons why you might see several data lakes in an organization.

Comments closed

Well-Architected Framework for Oracle in Azure

Kellyn Pot’vin-Gorman has a new tool for us:

This invaluable framework provides clear guidance on the recommended practices to assess, architect and migrate Oracle workloads to the Azure cloud.  This should be the first place for answers to success for Oracle on Azure!

A special thanks to my teammate, Jessica Haessler for working so hard to help me get this to the finish line, as I would have never been able to get this done on my own!  

Click through for a link to the guide. There isn’t a Well-Architected Framework assessment for this yet but the WAF articles themselves have quite a bit of detail to them.

Comments closed

Storing Semi-Additive Facts as Timespans

Timo Zishiri gives a new spin to a common warehousing problem:

In these cases, the measure may be aggregated across dates by averaging over the number of periods, e.g., average daily inventory levels. Measures can also be aggregated across dates by taking the maximum/minimum for the time interval.

More specifically, this blog focuses on an alternative approach to providing end users with the ability to do point-in-time analysis, so-called trend analysis.

Click through to see how a timespan table would work.

Comments closed

The Importance of Proper Data Modeling in Power BI

Paul Turley avoids “big, wide tables”:

Power BI is architected to consume data in a dimensional model, with narrow fact tables and related dimensions. Introducing a big, wide table in a tabular model is extremely inefficient. It takes up space and memory resources, impacts performance, and complicates measure coding. Flattening records into a flat table is one of the worst things you can do in Power BI and a common mistake made by novice Power BI users.

This is a conversation I’ve had with many customers. We want our cake, and we want to eat it too. We want to have all the analytic capabilities, interactivity and high performance but we also want the ability to drill-down to a lot of details. What if we have a legitimate need to report on transaction details and/or a large table with many columns? It is well-known that the ideal shape is a star schema but what if we need to shape data for detail reporting? The answer is that you can have it both ways, but just not in one table.

Read on for a better model design (hint: the Kimball style) as well as several tips and tricks.

Comments closed

Automating Archive Table Creation

Aaron Bertrand doesn’t want to archive things by himself every month:

Earlier in this series (part 1 | part 2), I wrote at a high level about how to solve issues with ever-growing log tables without large delete operations or data movement to a secondary archive table. In this tip, I’ll share a few code snippets you can use to automate the generation of objects to help make these solutions hands-free.

Read on for the tips.

Comments closed

Tips for Large Table Data Archival

Aaron Bertrand follows up on a prior post:

As soon as you realize your growth rates are higher than expected, you need to plan to buy or allocate more disk space. There is no way around this—more data means more disk. You can delay the inevitable for a little bit with better compression, but this is not a long-term fix, and it can impact query performance in different ways (trading CPU for I/O).

Once more disk is in place, you can plan your growth better.

Click through for some guidance on how to plan that growth.

Comments closed

Archival Tables in SQL Server

Aaron Bertrand starts a new series:

We all have one: the table that grows forever. Maybe it contains chat messages, post comments, or simple web traffic. Eventually, the table gets large enough that it becomes problematic – for example, users will notice that searches or updates take longer and longer as this massive, ever-growing table is scanned.

People often deal with this by archiving older data into a separate table. In this tip series, I’ll describe an archive table, explain why that solution carries its own set of problems, and show other potential ways to deal with data that grows indefinitely.

This is where we say, “Ah, if only Stretch DB had been priced approximately 1/100th of what it really was.” Stretch DB also had its own problems—especially if you ever needed to change the large table’s schema—but stay tuned for Aaron’s answers.

Comments closed

Software-as-a-Service: Single DB or Per-Client DB

Greg Low makes a choice:

On-premises applications are mostly single-tenant. They support a single organization. We do occasionally see multi-tenant databases. They hold the same types of information for many organizations.

But what about SaaS based applications? By default, you’ll want to store data for many client organizations. Should you create a large single database that holds data for everyone? Should you create a separate database for each client? Or should you create something in-between.

As with most things in computing, there is no one simple answer to this. Here are the main decision points that I look at:

Click through for Greg’s thoughts on the matter. Most of these factors are also relevant for on-premises SQL Server installations, not just Azure SQL DB/Managed Instance.

Comments closed

An Introduction to Event Sourcing

Aasif Ali provides a high-level introduction to the concept of event sourcing:

Event sourcing is a way to store data as events in an append-only log. It only keeps the latest version of the entity state. This method stores the state of a database object as a sequence of events. It is essentially a new event each time the object changed state, from the beginning of the object’s existence. An event can be anything that is generated by a user, a mouse click, a key press on a keyboard, and so on. It is a great way to atomically update the state and publish events. Not just can we query these events, but we can also use the event log to reconstruct past states, and as a foundation to automatically adjust the state to cope with retroactive changes.

Events are immutable, they cannot be changed. This well-known rule of event stores is often the first defining characteristic of event stores and event sourcing.

Read on to see how this concept works and how products like Apache Kafka make event sourcing viable.

Comments closed