Architecture – Page 23

Time limits: ending the neverending
Here’s one to ponder: how long can a long-running action go on before the customer (even a very patient, very digital customer) loses all interest in the outcome?
Pull up a chair. With no upper bound, we could be here a while.

Read on for more in that vein with JavaScript-y solutions.

Comments closed

Finding Dependency Clusters

Published 2019-04-04 by Kevin Feasel

Michael J. Swart performs cluster analysis with tables:

That ball of mush in the middle is hard to look at, but the smaller disconnected bits aren’t! Just like Ben, I want to work on those smaller pieces too! And just like the lonely tables we looked at last week, these small isolated components are also good candidates for extracting from SQL Server.

The script looks at joins in execution plans, which is a rather clever way of doing this when you don’t have a comprehensive set of foreign key constraints.

Comments closed

Big Data Often Isn’t

Published 2019-04-01 by Kevin Feasel

Arnon Rotem-gal-oz argues that “big data” is often a misnomer:

I couldn’t find numbers from Google but others say that by 2017 Google processed over 20PB a day (not to mention answering 40K search queries/second) so Google is definitely in the big data game. The numbers go down fast after that, even for companies who are really big data companies — Facebook presented back in 2017 that they handle 500TB+ of new data daily, the whole of Twitter’s data as of May 2018 was around 300PB, and Uber reported their data warehouse is in the 100+ PB range.
Ok, but what about the rest of us? Let’s take a look at an example.

I often fight with this myself—SQL Server can easily handle multi-billion row data sets, for example. It’s the same problem in Azure with SQL Data Warehouse: the “you must be this tall to ride the rides” marker is set pretty high.

Comments closed

Building the Right Architecture for the Job

Published 2019-04-01 by Kevin Feasel

Gogula Aryalingam takes us through an example where the newest and most expensive tools aren’t the best for the job at hand:

When Azure SQL Data Warehouse was chosen to implement a multi-dimensional data warehouse, it may have seemed like the ideal choice. Why? because it was plain to see: keywords: “SQL”, “Warehouse”. However, no, SQL Data Warehouse is ideal only when you have data loads that are quite high, not when it is only several 100GBs. Armed with a few more reasons as to why not (A good reference for choosing Azure SQL Data Warehouse), I had confronted them. But the rebuke then was that they did get good enough performance, and that cost wasn’t a problem. Until of course a few months later when complex queries started hitting the system, and despite being able to afford that cost, the value of paying that amount did not seem worth it.

Having a good architectural understanding of the Azure or AWS platform—even if you aren’t deeply familiar with all of the tools—can help avoid these types of problems.

Comments closed

Comparing Oracle RAC to SQL Server Availability Groups

Published 2019-03-29 by Kevin Feasel

Kellyn Pot’vin-Gorman explains the difference between Oracle RAC and SQL Server Availability Groups:

There is a constant rumble among Oracle DBAs- either all-in for Oracle Real Application Cluster, (RAC) or a desire to use it for the tool it was technically intended for. Oracle RAC can be very enticing- complex and feature rich, its the standard for engineered systems, such as Oracle Exadata and even the Oracle Data Appliance, (ODA). Newer implementation features, such as Oracle RAC One-Node offered even greater flexibility in the design of Oracle environments, but we need to also discuss what it isn’t- Oracle RAC is not a Disaster Recovery solution.

Click through for a good high-level contrast, as these are quite different products.

Comments closed

Contrasting Oracle’s Architecture with SQL Server’s

Published 2019-03-22 by Kevin Feasel

Kellyn Pot’vin-Gorman helps us understand where Oracle’s architecture differs from SQL Server’s:

The first thing you’ll notice is what Oracle refers to as an INSTANCE is different to what SQL Server calls one.
Oracle’s instance is most closely related to what SQL Server calls their database, (although it includes the files that Oracle includes in their description, too) and the Oracle home is *relatively* SQL Server’s version of an instance.

Read on for a deeper comparison from someone who has spent quite a bit of time working with both platforms.

Comments closed

Database-First or Kafka-First for Event Streaming

Published 2019-03-19 by Kevin Feasel

Gwen Shapiro takes us through a scenario where database-first writes for event streaming makes the most sense:

Note that the DB does quite a lot for you: it enforces serializability, locks, your logical constraints, etc. If the DB is distributed (Vitesse, Cockroach, Spanner, Yugabyte), it does even more.
If you were to go Kafka-first… well, it isn’t impossible. But all those responsibilities now belong to you as a developer. And if you are thinking there may be multiple webservers handling user requests and passing them to Kafka, you have to solve fairly challenging problems.

Read the whole thing.

Comments closed

Flattening Dimensional Models

Published 2019-03-18 by Kevin Feasel

Reza Rad explains why it makes sense to build flat dimensional models, particularly for Power BI:

The article that I wrote earlier this week about the shared dimension had a lot of interest, and I’m glad it helped many of you. So I thought better to write about the basics of modeling even more. In this article, I will be focusing on a scenario that you have all faced, however, took different approaches. Is it good to have too many dimension tables? can you combine some of those tables together to build one flatten dimension table? how much should you flatten it? should you end up with one huge table including everything? In this article, I’m answering all of these questions and explaining the scenarios of combining dimensions, as usual, I explain the model in Power BI. However, the concepts are applicable to any other tools. If you like to learn more about Power BI; read Power BI book from Rookie to Rock Star.

Given how closely the ideal Power BI data model matches the Kimball model, Reza’s advice makes perfect sense.

Comments closed

Securely Accessing External Resources From Databricks AWS

Published 2019-03-13 by Kevin Feasel

Itai Weiss shows how you can securely hit external data sources when using Databricks for AWS:

For security purposes, Databricks Apache Spark clusters are deployed in an isolated VPC dedicated to Databricks within the customer’s account. In order to run their data workloads, there is a need to have secure connectivity between the Databricks Spark Clusters and the above data sources.
It is straightforward for Databricks clusters located within the Databricks VPC to access data from AWS S3 which is not a VPC specific service. However, we need a different solution to access data from sources deployed in other VPCs such as AWS Redshift, RDS databases, streaming data from Kinesis or Kafka. This blog will walk you through some of the options you have available to access data from these sources securely and their cost considerations for deployments on AWS. In order to establish a secure connection to these data sources, we will have to configure the Databricks VPC with either one of the following two available options :

Read on for those two options.

Comments closed

Monitoring Kafka Streams with JMX Metrics

Published 2019-03-07 by Kevin Feasel

Rishi Khandelwal provides a reference architecture for monitoring a Kafka Streams application using JMX Metrics and pushing the results into Graphite:

Service (application) exposes the JMX metrics at some port which will be captured by Jolokia java agent. Then Jolokia exposes those metrics at some port which is easily accessible through a rest endpoint (we call it Jolokia URL). Then we have JMX2Graphte which polls the metrics from Jolokia URL and push it to Graphite. Then Grafana reads the Graphite metrics and creates a beautiful dashboard for us along with the alerts.
So this is the working of the proposed monitoring solution. Now let’s discuss the components of the monitoring solution.

There’s a bit of code/configuration in here as well, so check it out.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Category: Architecture

Problems Distributed Systems Experience

Finding Dependency Clusters

Big Data Often Isn’t

Building the Right Architecture for the Job

Comparing Oracle RAC to SQL Server Availability Groups

Contrasting Oracle’s Architecture with SQL Server’s

Database-First or Kafka-First for Event Streaming

Flattening Dimensional Models

Securely Accessing External Resources From Databricks AWS

Monitoring Kafka Streams with JMX Metrics