Press "Enter" to skip to content

Category: Architecture

Tokenization in SQL Server

Sebastiao Pereira demonstrates a combination of encryption and redirection to store sensitive data:

As privacy regulations tighten like General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), Payment Card Industry Data Security Standards (PCI DSS) organizations and more, there is an increased focus to protect sensitive information within databases. Tokenization is an option to adhere to those regulations. Let’s see how to implement SQL tokenization in SQL Server.

This is a reasonably clever solution, though if you need to search on any of the tokenized (i.e., encrypted and moved to a separate table) values, performance would be miserable. Even displaying the results for a moderately sized result set would run into serious performance issues. I suppose that if you, for some regulatory reason, need to keep these tokens stored elsewhere from the data, then you manage expectations the best you can.

Leave a Comment

Ingesting IoT Data into SQL Server via Python

Hristo Hristov builds an app:

MQTT is a lightweight Industrial IoT communications protocol allowing efficient communication to and from edge devices such as machines, sensors, and actuators. How can we get data from an MQTT on-premises or cloud broker and persist them in an SQL Server database? How can we leverage the newest features in SQL Server 2025 to make efficient query compilations and build a scalable solution for a data pipeline for permanently storing IoT data?

Read on for the code, most of which is in Python.

Leave a Comment

Building Storage Tiers with Pure Storage in Powershell

Anthony Nocentino creates a medallion storage layout:

In modern IT environments, not all workloads require the same level of storage performance, protection, or cost. Some applications need high performance with aggressive data protection, while others are perfectly fine with lower performance in exchange for cost savings. This tiered approach to storage service delivery is fundamental to efficient infrastructure management.

In my previous post on Fusion, I took an application-centric approach, showing how to deploy SQL Servers using Fusion. Let’s switch gears now and learn how to define a storage service catalog. In this post, I’ll demonstrate how to build a complete storage service catalog using Pure Storage Fusion Presets, offering Bronze, Silver, and Gold tiers with optional replication. We’ll see how to leverage different array types (FlashArray //X and FlashArray //C) to optimize both performance and cost across your fleet.

Read on for a link to the code, as well as more information on how it works.

Comments closed

A Primer on ACID Compliance

Erik Darling takes an academic concept and explains what it means in practice for SQL Server. Erik does a good job describing the concepts of atomicity, consistency, isolation, and durability. I do agree with Erik’s take on consistency, which tends to be the property that database platforms minimize in return for scalability. The descriptions of all four are good, though Erik has a lot more content that digs into consistency and isolation.

Comments closed

Azure API Management in front of Databricks and OpenAI

Drew Furgiuele has a follow-up:

A few months ago, I wrote a blog post about using Azure API Management with Databricks Model Serving endpoints. It struck a chord with a lot of people using Databricks on Azure specifically, because more and more people and organizations are trying their damndest to wrangle all the APIs they use and/or deploy themselves. Recently, I got an email from someone who read it and asked a really good question:

Click through for that question, as well as Drew’s answer.

Comments closed

Retry Resiliency in Apache Kafka Pipelines

Ravi Teja Thutari explains the value of idempotence in moving data between systems:

In modern flight booking systems, streaming fare updates and reservations through distributed microservices is common. These pipelines must be retry-resilient, ensuring that transient failures or replays don’t cause duplicate bookings or stale pricing. A core strategy is idempotency: each event (e.g., a fare-update or booking command) carries a unique identifier so processing it more than once has no adverse effect. 

Read on to learn more. For reference, idempotence is a property of an operation where you can run through the operation as many times as you wish and will always end up at the same result. In the data operations world, this ties to the final state in a database. If I run a process once and it adds three rows to the database, I should be able to run the process a second time and end up with those exact three rows, no more, no fewer, and no different.

Comments closed

Tips for Highly Available PostgreSQL Systems

Semab Tariq provides some high-level guidance:

In today’s digital landscape, downtime isn’t just inconvenient, it’s costly. No matter what business you are running, an e-commerce site, a SaaS platform, or critical internal systems, your PostgreSQL database must be resilient, recoverable, and continuously available. So in short

High Availability (HA) is not a feature you enable; it’s a system you design.

In this blog, we will walk through the important things to consider when setting up a reliable, production-ready HA PostgreSQL system for your applications.

Click through for a variety of things to think about. Most of this will apply to other database systems as well, though specific tools will differ.

Comments closed

High Availability Architecture for PostgreSQL

Umair Shahid adds a 9:

Most teams building production applications understand that “uptime” matters. I am writing this blog to demonstrate how much difference an extra 0.09% makes.

At 99.9% availability, your system can be down for over 43 minutes every month. At 99.99%, that window drops to just over 4 minutes. If your product is critical to business operations, customer workflows, or revenue generation, those 39 extra minutes of downtime each month can be the difference between trust and churn.

Click through for some of the tools and practices that can help get you there in PostgreSQL.

Comments closed

The Small Data Showdown in Microsoft Fabric

Miles Cole does a bit of testing:

First, let’s revisit the purpose of the benchmark: The objective is to explore data engineering engines available in Fabric to understand whether Spark with vectorized execution (the Native Execution Engine) should be considered in small data architectures.

Beyond refreshing the benchmark to see if any core findings have changed, I do want to expand in a few areas where I got great feedback from the community:

I really appreciate the approach behind this, both in terms of sticking to more realistic data sizes for many operations as well as performing this test given all of the recent improvements in each engine.

Comments closed

Building Entity-Relationship Diagrams with DBeaver

Dave Stokes builds a diagram:

Even the most experienced database professionals are known to feel a little anxious when peering into an unfamiliar database. Hopefully, they inspect to see how the data is normalized and how the various tables are combined to answer complex queries.  Entity Relationship Maps (ERM) provide a visual overview of how tables are related and can document the structure of the data.

Read on to see how you can do this with the DBeaver database access client.

Comments closed