Press "Enter" to skip to content

Category: Architecture

Cloud Risk: Service Obsolescence

Joy George Kunjikkur takes us through a risk scenario using an example of the Azure chat bot service:

Beginning of last year, we started to develop a chat bot demo. The idea was to integrate the chat bot into one of the big applications as a replacement to FAQ. Users can ask questions to bot thus avoiding obvious support tickets in the future.

Things went well. We got appreciation on the demo and started moving to production. About half way, things started turning south. The demo chat bot application used Bot SDK V3. It had voice recognition enabled which allow users to talk to it and get the response back in voice. During the demo we used Azure Bing Speech API. But later before the production, we got the notice that the service is obsolete and will be retired mid 2019. Another surprise was the introduction of Bot SDK V4 which is entirely different that Bot SDK V3. Something like AngularJS v/s Angular.

The major services tend to give you some time to switch over—in this case, they had 10 months to make a move. But when dealing with online services versus locally installed products, there’s always a risk that the service you’re calling won’t be there, and depending upon how critical that service is, it can have a major effect on your ability to function if it disappears one day. That’s definitely not a reason to ignore these services; it’s a reason to have a backup plan in place.

Comments closed

Review: dbForge Studio For Database Modeling

Randolph West is looking for a product for database modeling and tries out dbForge Studio:

These days I still design new databases from scratch with pen and paper (or iPad and Apple Pencil), where the entity relationship diagram (ERD) is rudimentary and crows’ feet relationships are badly-scrawled. But it got me wondering which database modelling tools are on the market today (commercial and free).

My ideal tool should be able to design a new database from scratch and generate creation scripts in T-SQL without failing over common issues like referential integrity and dependencies. More importantly though, it should be able to reverse-engineer a database (like Microsoft Visio used to be able to). This is extremely useful for consulting engagements when I need to get a picture in my head of the database I’m looking at. This was the one place I’ve used the Database Designer in SSMS more than I had initially remembered.

Randolph also mentions SQL Database Modeler, which I used on a consulting engagement where I wanted to replicate Visio’s database reverse engineer functionality.

Comments closed

The Forgotten Infrastructure Below Azure BI Architecture Diagrams

Meagan Longoria reminds us that there are several products which Azure BI projects need but which we tend to forget when building architectural diagrams:

Let’s start with Azure Active Directory (AAD). In order to provision the resources in the diagram, your Azure subscription must already be associated with an Active Directory. AAD is Microsoft’s cloud-based identity and access management service. Members of an organization have a user account that can sign in to various services. AAD is used to access Office 365, Power BI, and Dynamics 365, as well as the Azure portal. It can also be used to grant access and permissions to specific Azure resources.

Meagan has several of these, so check it out.

Comments closed

Data Transformation Tools In The Azure Space

James Serra gives us an overview of the major tools you would use for ETL and ELT in Azure:

If you are building a big data solution in the cloud, you will likely be landing most of the source data into a data lake.  And much of this data will need to be transformed (i.e. cleaned and joined together – the “T” in ETL).  Since the data lake is just storage (i.e. Azure Data Lake Storage Gen2 or Azure Blob Storage), you need to pick a product that will be the compute and will do the transformation of the data.  There is good news and bad news when it comes to which product to use.  The good news is there are a lot of products to choose from.  The bad news is there are a lot of products to choose from :-).  I’ll try to help your decision-making by talking briefly about most of the Azure choices and the best use cases for each when it comes to transforming data (although some of these products also do the Extract and Load part

The only surprise is the non-mention of Azure Data Lake Analytics, and there is a good conversation in the comments section explaining why.

Comments closed

Design Tips For Scaling Systems

Erik Darling has a few ideas for how you can design that SQL Server instance and database for future growth:

I can’t begin to tell you how many terrible things you can avoid by starting your apps out using an optimistic isolation level. Read queries and write queries can magically exist together, at the expense of some tempdb.
Yes, that means you can’t leave transactions open for a very long time, but hey, you shouldn’t do that anyway.
Yes, that means you’ll suffer a bit more if you perform large modifications, but you should be batching them anyway.

Optimistic concurrency is huge—definitely worth the top slot in Erik’s list.

Comments closed

Using Databricks Delta In Lieu Of Lambda Architecture

Jose Mendes contrasts the Lambda architecture with the Databricks Delta architecture and gives us a quick example of using Databricks Delta:

The major problem of the Lambda architecture is that we have to build two separate pipelines, which can be very complex, and, ultimately, difficult to combine the processing of batch and real-time data, however, it is now possible to overcome such limitation if we have the possibility to change our approach.
Databricks Delta delivers a powerful transactional storage layer by harnessing the power of Apache Spark and Databricks File System (DBFS). It is a single data management tool that combines the scale of a data lake, the reliability and performance of a data warehouse, and the low latency of streaming in a single system. The core abstraction of Databricks Delta is an optimized Spark table that stores data as parquet files in DBFS and maintains a transaction log that tracks changes to the table.

It’s an interesting contrast and I recommend reading the whole thing.

Comments closed

An Overview Of Apache Kafka

Leona Zhang has a series going on Apache Kafka.  Part one covers some of the concepts around messaging systems:

There is a difference between batch processing applications and stream processing applications. A visible boundary determines the most significant difference between batch processing and stream processing. If it exists, it is called batch processing. For example, a client collects the data once every hour, sends this data to the server for statistics, and then saves the statistical results in the statistical database.
If the boundary doesn’t exist, the processing is called streaming data (stream processing). Here is an example of stream processing: logs and orders are generated continuously on a large website just like a data flow. If the processing of each log and order takes less than several hundred milliseconds or several seconds after its generation, the application is called a stream application. If the collection of logs and orders happens once every hour followed by a unified transmission, the original stream data converts into batch data.
Occasionally, stream processing becomes imperative. For example, Jack Ma wanted to display the orders and sales on Tmall for November 11 on a large screen. If the data center works in a T+1 mode and can obtain data for November 11 on November 12, Jack Ma would not be happy.

Part two is an overview of the architectural components used in Kafka:

Kafka uses the group concept to integrate the producer/consumer and publisher/subscriber models.
One topic may have multiple groups, and one group may include multiple consumers. Only one consumer in the group can consume one message. For different groups, consumers are in the publisher/subscriber model. All groups receive one message. 
Note: Allocate one partition to only one consumer in the same group. If there are three partitions and four consumers in one of the groups, one consumer is redundant and cannot receive any data.

This looks to be the start to a good series.

Comments closed

NoSQL? No! MoSQL

Steve Jones points out a bit of a shift at Google:


Google is doing more SQL, or at least shifting towards relational SQL databases as a way of storing data. At least, some of their engineers see this as a better way to store data for many problems. Since I’m a relational database advocate, I found this to be interesting.
When Google first started to publish information on BigTable and other new ways of dealing with large amounts of data, I felt that these weren’t solutions I’d use or problems that many people had. The idea of Map Reduce is interesting and certainly applicable to the problem space Google had of a global database of sites, but that’s not a problem I’ve ever encountered. Instead, most of the struggles I’ve had with relational systems are still better addressed in a relational system.

Read the whole thing.  Note that this is slightly different than Feasel’s Law, as Steve is focusing more on the consistency side of things rather than the interface.

Also, just going to leave this here:

Comments closed

Dealing With System Sprawl

Charity Majors has a simple (but not easy) solution to system sprawl:

Stop me if you’ve heard this one before.

The company is growing like crazy, your engineering team keeps rising to the challenge, and you are ferociously proud of them.  But some cracks are beginning to show, and frankly you’re a little worried.  You have always advocated for engineers to have broad latitude in technical decisions, including choosing languages and tools.  This autonomy and culture of ownership is part of how you have successfully hired and retained top talent despite the siren song of the Faceboogles.

But recently you saw something terrifying that you cannot unsee: your company is using all the languages, all the environments, all the databases, all the build tools.  Shit!!!  Your ops team is in full revolt and you can’t really blame them.  It’s grown into an unsupportable nightmare and something MUST be done, but you don’t know what or how — let alone how to solve it while retaining the autonomy and personal agency that you all value so highly.

I hear a version of this everywhere I’ve gone for the past year or two.  It’s crazy how often.  I’ve been meaning to write my answer up for ages, and here it (finally) is.

I like the solution:  embrace the sprawl but make the default a stable set of well-supported systems with reasons for people to want to start there.  Read the whole thing.

Comments closed

Monitoring At Stack Overflow

Nick Craver has been driven off the bend by monitoring and we get to enjoy the fruits of it:

…but evidently some people think of other things. Those people are obviously wrong, but let’s continue. When I’m not a walking zombie after reading a 10,000 word blog post some idiot wrote, I see monitoring as the process of keeping an eye on your stuff, like a security guard sitting at a desk full of cameras somewhere. Sometimes they fall asleep–that’s monitoring going down. Sometimes they’re distracted with a doughnut delivery–that’s an upgrade outage. Sometimes the camera is on a loop–I don’t know where I was going with that one, but someone’s probably robbing you. And then you have the fire alarm. You don’t need a human to trigger that. The same applies when a door gets opened, maybe that’s wired to a siren. Or maybe it’s not. Or maybe the siren broke in 1984.

I know what you’re thinking: Nick, what the hell? My point is only that monitoring any application isn’t that much different from monitoring anything else. Some things you can automate. Some things you can’t. Some things have thresholds for which alarms are valid. Sometimes you’ll get those thresholds wrong (especially on holidays). And sometimes, when setting up further automation isn’t quite worth it, you just make using human eyes easier.

This is a really good post covering monitoring techniques at a high level and getting into specific implementations at Stack Overflow.

Comments closed