Press "Enter" to skip to content

Category: Architecture

PASS Summit Q&A: The Curated Data Platform

I answer some questions:

On Thursday, I presented a session at PASS Summit entitled The Curated Data Platform. You can grab slides and links to additional information on my website. Thank you to everyone who attended the session.

During and after the session, I had a few questions come in from the audience, and I wanted to cover them here.

Most of the questions were around document databases, so check them out.

Comments closed

Tips for Moving to a Multi-Tenant Setup

Adrian Hills continues a series on multi-tenant SQL Server:

What you knew a few years ago might differ significantly from the reality today, whether you started off with a single tenant system that you pivoted quickly to support multiple tenants, or you envisioned 10s of tenants and ended up with 1000s. Whatever that reality is, when you experience pain points around the 3 considerations I covered in part 1 of this series (security, maintainability, and scalability), it can lead to a need to change the multi-tenancy approach you’re using. Often, the biggest driver for change is around performance and scalability and typically tends to be related to a need to move from a less-isolated multi-tenancy approach (single database) to a more-isolated approach that supports the scaling out of workloads (multiple databases).

In the steps below, I’ll cover the general path you can follow to successfully make an architectural change like this to the database layer. As a provider of database performance monitoring and DataOps tools, SentryOne offers some tools that can come in handy along the way, so I’ll call those out as we go.

Click through for guidance.

Comments closed

Multi-Tenant Database Designs

Adrian Hills walks us through four database designs for multi-tenant data storage:

In my previous blog post, I talked about some of the key considerations around designing a multi-tenant system using SQL Server. There are several ways to implement multi-tenancy, and, as is often the case, there is no single “best” way but rather a range of options that each offer different trade-offs. The approach that is right for you depends on your objectives and needs for your specific environment. It’s important to consider which of these approaches best suit your requirements and goals based on the 3 core considerations from Multi-Tenancy with SQL Server, Part 1: security, maintainability (manageability), and scalability.

The following are the 4 approaches I will cover in this blog post:
1. Single database, shared schema
2. Single database, separate schema
3. Database per tenant
4. Multiple databases, multiple tenants per database, shared schema

I’ve worked with options 1, 3, and 4. Read on for Adrian’s thoughts. Ceteris paribus, my preference is 3. That said, I’ve worked in a situation where I migrated from 3 to 1 because there were thousands of customers, none of whom had more than hundreds of megabytes worth of data. 4 provides a good balance in that fashion, where you can bunch up smaller clients and give larger clients their own databases (and sometimes even servers). But if you’re going with options 2, 3, and 4, you probably want a central data warehouse which collects data across all four for internal use.

Comments closed

Power BI Icons for Diagrams.Net

Marc Lelijveld has some icons for us:

Previously, I used a simple PowerPoint slide when I drafted technical solution proposals. This took me a whole lot of time by copy-pasting all the images, make it look nice and connect the dots together. While tools like diagrams.net are built for this purpose, I always stuck with PowerPoint as there were no icons for all Power BI objects in this tool. Until now!

The online tool Diagrams.net allow you to quickly draft your solution architecture by dragging and dropping icons on a white canvas and easily connecting the dots together.

I’ve been a big fan of diagrams.net (nee draw.io), so thank you Marc for putting this together.

Comments closed

The Downside of EAV-Style Measures in Power BI

Chris Webb explains why you should try to stick to the fact-dimensional model in Power BI:

In this fact table the dimension keys remain the same, but the Value column stores all the data from the Sales, Tax and Volume Sold measures in the original table and the Measure Name column tells you what type of measure value is stored on any given row. Let’s call this approach the Measures Dimension approach.

There are some advantages to building fact tables using the Measures Dimension approach, for example:

– You can now use a slicer in a report to select the measures that appear in a visual
– You can now easily add new measures without having to add new columns in your fact table
– You can use row-level security to control which measures a user has access to

Generally speaking, though, any time you deviate from a conventional dimensional model you risk running into problems later on and this is no exception. Let’s go through the disadvantages of modelling data using a Measures Dimension.

Read on for several good reasons (and yes, “things are formatted wrong” is a good reason!).

Comments closed

Azure Icons in Diagrams.Net

Dave Ruijter walks us through using Azure icons in diagrams.net (nee draw.io):

How to use the icon collection in draw.io (diagrams.net)

I’ve made it easy for you: I’ve created a couple of custom libraries with all the icons! I’ve uploaded them to my GitHub, and you can use them in a couple of ways.

The first and most simple option is to just start the diagrams.net app with an URL with the correct libraries included as URL parameters:

Click through to read the whole thing. I’m a long-time fan of diagrams.net and am glad to see Dave’s offering for us.

1 Comment

ksqlDB 0.11.0

Victoria Xia announces ksqlDB 0.11.0:

ksqlDB 0.11.0 contains improvements and fixes spanning stranded transient queriesoverly aggressive schema compatibility checksconfusing behavior around casting nullsbad schema management, and more. Here, we highlight a couple of additional, notable improvements.

Also on my backlog was Andy Coates, talking about key columns in ksqlDB:

ksqlDB 0.10 includes significant changes and improvements to how keys are handled. This is part of a series of enhancements that began with support for non-VARCHAR keys and will ultimately end with ksqlDB supporting multiple key columns and multiple key formats, including Avro, JSON, and Protobuf.

Before looking at the syntax changes in version 0.10, let’s first look at what is meant by keys in ksqlDB, the two types of key columns, and how this may differ from other SQL systems.

Read on, as it’s an interesting look at how different data architectures can mean radically different recommendations for key design.

Comments closed

Using the Azure Architecture Icons

Steve Jones tries out some of the Azure Architecture Icons:

The icons are svg, so while they work in PowerPoint, adding them to something like this post in OpenLiveWriter doesn’t work. However, I could make a quick diagram and capture an image of it.

Not great, but it shows I can put icons on a page with arrows.

Going one step further, I’ve been digging into Diagrams by mingrammer lately. With it, you use Python to generate diagrams, and there are quite a few Azure icons in there, as well as AWS, on-prem, etc.

Here’s a quick example of what you can do, taken from an upcoming talk of mine:

There are some limitations based on the underlying library, such as how you can’t connect cluster to cluster—meaning I can’t draw a line from “Logging” to “Storage\Logs”; I have to draw it from a particular element (Loki) to a particular element (Elasticsearch). In a lot of traditional reference architecture diagrams, though, that isn’t a problem.

Comments closed

Calculating Cloud App Availability

Dave Bermingham gives you a way to calculate how available you should expect your application to be given SLAs:

When deploying business critical applications in the cloud you want to make sure they are highly available. The good news is that if you plan properly, you can achieve 99.99% (4-nines) of availability or more. However, calculating your true availability may not be as straightforward as it seems.

When considering availability you must consider the key components that make access to your application possible, which I’ll call the availability chain. Component of the availability chain are:

– Compute
– Network 
– Storage
– Application
– Dependent services

Your application is only as available as your weakest link, and your downtime increases exponentially with each additional link you add to the chain.  Let’s examine each of the links. 

Read on for a breakdown of these items.

Comments closed

Cassandra Monitoring and Data Modeling

Instaclustr has put up a couple interesting posts on Cassandra. First, Anup Shirolkar explains how we can monitor Cassandra installations:

Cassandra is developed in Java and is a JVM based system. Each Cassandra node runs a single Cassandra process. JVM based systems are enabled with JMX (Java Management Extensions) for monitoring and management. Cassandra exposes various metrics using MBeans which can be accessed through JMX. Cassandra monitoring tools are configured to scrape the metrics through JMX and then filter, aggregate, and render the metrics in the desired format. There are a few performance limitations in the JMX monitoring method, which are referred to later. 

The metrics management in Cassandra is performed using Dropwizard library. The metrics are collected per node in Cassandra. However, those can be aggregated by the monitoring system. 

On the development side, the Instaclustr team walks us through data modeling guidelines:

The ultimate goal of Cassandra data modeling and analysis is to develop a complete, well organized, and high performance Cassandra cluster. Following the five Cassandra data modeling best practices outlined will hopefully help you meet that goal:

1. Cassandra is not a relational database, don’t try to model it like one
2. Design your model to meet 3 fundamental goals for data distribution
3. Understand the importance of the Primary Key in the overall data structure 
4. Model around your queries but don’t forget about your data
5. Follow a six step structured approach to building your model. 

Because Cassandra uses a variant of SQL, it’s easy to forget that data is stored completely differently and that design decisions are quite different from what we see in the relational world.

Comments closed