Category: Architecture

Microservice Communication Patterns

Published 2019-02-26 by Kevin Feasel

John Hammink shares a few ways that you can have microservices communicate with one another and argues that Kafka is a great platform for microservice communication:

Simply put, microservices are a software development method where applications are structured as loosely coupled services. The services themselves are minimal atomic units which together, comprise the entire functionality of the entire app. Whereas in an SOA, a single component service may combine one or several functions, a microservice within an MSA does one thing — only one thing — and does it well.
Microservices can be thought of as minimal units of functionality, can be deployed independently, are reusable, and communicate with each other via various network protocols like HTTP (More on that in a moment).

Read the whole thing. I have a love-hate relationship with these but it’s a pattern worth understanding.

Comments closed

Power BI Architecture Diagram V4

Published 2019-02-18 by Kevin Feasel

Dustin Ryan has a new version of the Power BI Architecture Diagram:

First and most importantly, I updated the Power BI logo in the diagram to the latest version of the logo!
Secondly, I included Power BI Dataflows in the diagram tagged #6. Power BI Dataflows are used to ingest, transform, integrate, and enrich big data by defining data source connections, ETL logic, refresh schedules, and more. Data is stored as entities in the Common Data Model in Azure Data Lake Storage Gen2. Dataflow entities can be consumed as a data source in Power BI and by using Power BI Desktop. Read more about Dataflows here.

Click through for a full changelog and a link to download the architecture diagram and legend.

Comments closed

Kafka And The Differing Aims Of Data Professionals

Published 2019-02-15 by Kevin Feasel

Kai Waehner argues that there is an impedence mismatch between data engineers, data scientists, and ML production engineers:

Data scientists love Python, period. Therefore, the majority of machine learning/deep learning frameworks focus on Python APIs. Both the stablest and most cutting edge APIs, as well as the majority of examples and tutorials use Python APIs. In addition to Python support, there is typically support for other programming languages, including JavaScript for web integration and Java for platform integration—though oftentimes with fewer features and less maturity. No matter what other platforms are supported, chances are very high that your data scientists will build and train their analytic models with Python.
There is an impedance mismatch between model development using Python, its tool stack and a scalable, reliable data platform with low latency, high throughput, zero data loss and 24/7 availability requirements needed for data ingestion, preprocessing, model deployment and monitoring at scale. Python in practice is not the most well-known technology for these requirements. However, it is a great client for a data platform like Apache Kafka.

Click through for the full argument as well as where Kafka can help mitigate some of the issues.

Comments closed

Reporting Services Scale-Out With Docker

Published 2019-01-30 by Kevin Feasel

Paul Stanton architects out a scenario using Windocks to create cloned Reporting Services containers in order to scale out Reporting Services:

Database cloning is a key aspect of the SSRS scale out architecture, with database clones providing each container a complete set of databases. Two or more VMs operated behind a load balancer delivers a highly available and scalable reporting service. This article focuses on Windows SQL Server containers and Windows Virtual Hard Drive (VHD) based cloning, but the same architecture can support SQL Server Linux containers or conventional instances (Windows or Linux). Redgate SQL Clone, for example would support SQL Server instances. Other options include the use of storage arrays instead of Windows VHD based clones. The trade-offs between SQL containers and instances, and between VHDs and storage arrays are covered in separate sections below.
The combination of SSRS containers with database cloning is appealing for simplicity and operational savings. SSRS containers are also drawing interest as part of public cloud strategies, as SSRS containers can be integrated with AWS RDS or SQL Azure databases to provide a horizontally scalable reporting solution.

This is a bit more complex than Reporting Services scale-out with Enterprise Edition, but if you’re on Standard Edition and can’t use scale-out, it’s an interesting alternative.

Comments closed

Cloud Risk: Service Obsolescence

Published 2019-01-23 by Kevin Feasel

Joy George Kunjikkur takes us through a risk scenario using an example of the Azure chat bot service:

Beginning of last year, we started to develop a chat bot demo. The idea was to integrate the chat bot into one of the big applications as a replacement to FAQ. Users can ask questions to bot thus avoiding obvious support tickets in the future.

Things went well. We got appreciation on the demo and started moving to production. About half way, things started turning south. The demo chat bot application used Bot SDK V3. It had voice recognition enabled which allow users to talk to it and get the response back in voice. During the demo we used Azure Bing Speech API. But later before the production, we got the notice that the service is obsolete and will be retired mid 2019. Another surprise was the introduction of Bot SDK V4 which is entirely different that Bot SDK V3. Something like AngularJS v/s Angular.

The major services tend to give you some time to switch over—in this case, they had 10 months to make a move. But when dealing with online services versus locally installed products, there’s always a risk that the service you’re calling won’t be there, and depending upon how critical that service is, it can have a major effect on your ability to function if it disappears one day. That’s definitely not a reason to ignore these services; it’s a reason to have a backup plan in place.

Comments closed

Review: dbForge Studio For Database Modeling

Published 2019-01-17 by Kevin Feasel

Randolph West is looking for a product for database modeling and tries out dbForge Studio:

These days I still design new databases from scratch with pen and paper (or iPad and Apple Pencil), where the entity relationship diagram (ERD) is rudimentary and crows’ feet relationships are badly-scrawled. But it got me wondering which database modelling tools are on the market today (commercial and free).
My ideal tool should be able to design a new database from scratch and generate creation scripts in T-SQL without failing over common issues like referential integrity and dependencies. More importantly though, it should be able to reverse-engineer a database (like Microsoft Visio used to be able to). This is extremely useful for consulting engagements when I need to get a picture in my head of the database I’m looking at. This was the one place I’ve used the Database Designer in SSMS more than I had initially remembered.

Randolph also mentions SQL Database Modeler, which I used on a consulting engagement where I wanted to replicate Visio’s database reverse engineer functionality.

Comments closed

The Forgotten Infrastructure Below Azure BI Architecture Diagrams

Published 2019-01-14 by Kevin Feasel

Meagan Longoria reminds us that there are several products which Azure BI projects need but which we tend to forget when building architectural diagrams:

Let’s start with Azure Active Directory (AAD). In order to provision the resources in the diagram, your Azure subscription must already be associated with an Active Directory. AAD is Microsoft’s cloud-based identity and access management service. Members of an organization have a user account that can sign in to various services. AAD is used to access Office 365, Power BI, and Dynamics 365, as well as the Azure portal. It can also be used to grant access and permissions to specific Azure resources.

Meagan has several of these, so check it out.

Comments closed

Data Transformation Tools In The Azure Space

Published 2019-01-11 by Kevin Feasel

James Serra gives us an overview of the major tools you would use for ETL and ELT in Azure:

If you are building a big data solution in the cloud, you will likely be landing most of the source data into a data lake. And much of this data will need to be transformed (i.e. cleaned and joined together – the “T” in ETL). Since the data lake is just storage (i.e. Azure Data Lake Storage Gen2 or Azure Blob Storage), you need to pick a product that will be the compute and will do the transformation of the data. There is good news and bad news when it comes to which product to use. The good news is there are a lot of products to choose from. The bad news is there are a lot of products to choose from :-). I’ll try to help your decision-making by talking briefly about most of the Azure choices and the best use cases for each when it comes to transforming data (although some of these products also do the Extract and Load part

The only surprise is the non-mention of Azure Data Lake Analytics, and there is a good conversation in the comments section explaining why.

Comments closed

Design Tips For Scaling Systems

Published 2019-01-04 by Kevin Feasel

Erik Darling has a few ideas for how you can design that SQL Server instance and database for future growth:

I can’t begin to tell you how many terrible things you can avoid by starting your apps out using an optimistic isolation level. Read queries and write queries can magically exist together, at the expense of some tempdb.
Yes, that means you can’t leave transactions open for a very long time, but hey, you shouldn’t do that anyway.
Yes, that means you’ll suffer a bit more if you perform large modifications, but you should be batching them anyway.

Optimistic concurrency is huge—definitely worth the top slot in Erik’s list.

Comments closed

Using Databricks Delta In Lieu Of Lambda Architecture

Published 2018-12-10 by Kevin Feasel

Jose Mendes contrasts the Lambda architecture with the Databricks Delta architecture and gives us a quick example of using Databricks Delta:

The major problem of the Lambda architecture is that we have to build two separate pipelines, which can be very complex, and, ultimately, difficult to combine the processing of batch and real-time data, however, it is now possible to overcome such limitation if we have the possibility to change our approach.
Databricks Delta delivers a powerful transactional storage layer by harnessing the power of Apache Spark and Databricks File System (DBFS). It is a single data management tool that combines the scale of a data lake, the reliability and performance of a data warehouse, and the low latency of streaming in a single system. The core abstraction of Databricks Delta is an optimized Spark table that stores data as parquet files in DBFS and maintains a transaction log that tracks changes to the table.

It’s an interesting contrast and I recommend reading the whole thing.

Comments closed