Erik Darling takes an academic concept and explains what it means in practice for SQL Server. Erik does a good job describing the concepts of atomicity, consistency, isolation, and durability. I do agree with Erik’s take on consistency, which tends to be the property that database platforms minimize in return for scalability. The descriptions of all four are good, though Erik has a lot more content that digs into consistency and isolation.
Leave a CommentCategory: Architecture
Drew Furgiuele has a follow-up:
A few months ago, I wrote a blog post about using Azure API Management with Databricks Model Serving endpoints. It struck a chord with a lot of people using Databricks on Azure specifically, because more and more people and organizations are trying their damndest to wrangle all the APIs they use and/or deploy themselves. Recently, I got an email from someone who read it and asked a really good question:
Click through for that question, as well as Drew’s answer.
Comments closedRavi Teja Thutari explains the value of idempotence in moving data between systems:
In modern flight booking systems, streaming fare updates and reservations through distributed microservices is common. These pipelines must be retry-resilient, ensuring that transient failures or replays don’t cause duplicate bookings or stale pricing. A core strategy is idempotency: each event (e.g., a fare-update or booking command) carries a unique identifier so processing it more than once has no adverse effect.
Read on to learn more. For reference, idempotence is a property of an operation where you can run through the operation as many times as you wish and will always end up at the same result. In the data operations world, this ties to the final state in a database. If I run a process once and it adds three rows to the database, I should be able to run the process a second time and end up with those exact three rows, no more, no fewer, and no different.
Comments closedSemab Tariq provides some high-level guidance:
In today’s digital landscape, downtime isn’t just inconvenient, it’s costly. No matter what business you are running, an e-commerce site, a SaaS platform, or critical internal systems, your PostgreSQL database must be resilient, recoverable, and continuously available. So in short
High Availability (HA) is not a feature you enable; it’s a system you design.
In this blog, we will walk through the important things to consider when setting up a reliable, production-ready HA PostgreSQL system for your applications.
Click through for a variety of things to think about. Most of this will apply to other database systems as well, though specific tools will differ.
Comments closedMost teams building production applications understand that “uptime” matters. I am writing this blog to demonstrate how much difference an extra 0.09% makes.
At 99.9% availability, your system can be down for over 43 minutes every month. At 99.99%, that window drops to just over 4 minutes. If your product is critical to business operations, customer workflows, or revenue generation, those 39 extra minutes of downtime each month can be the difference between trust and churn.
Click through for some of the tools and practices that can help get you there in PostgreSQL.
Comments closedMiles Cole does a bit of testing:
First, let’s revisit the purpose of the benchmark: The objective is to explore data engineering engines available in Fabric to understand whether Spark with vectorized execution (the Native Execution Engine) should be considered in small data architectures.
Beyond refreshing the benchmark to see if any core findings have changed, I do want to expand in a few areas where I got great feedback from the community:
I really appreciate the approach behind this, both in terms of sticking to more realistic data sizes for many operations as well as performing this test given all of the recent improvements in each engine.
Comments closedEven the most experienced database professionals are known to feel a little anxious when peering into an unfamiliar database. Hopefully, they inspect to see how the data is normalized and how the various tables are combined to answer complex queries. Entity Relationship Maps (ERM) provide a visual overview of how tables are related and can document the structure of the data.
Read on to see how you can do this with the DBeaver database access client.
Comments closedAnant Kumar designs a data lake:
As companies collect massive amounts of data to fuel their artificial intelligence and machine learning initiatives, finding the right data architecture for storing, managing, and accessing such data is crucial. Traditional data storage practices are likely to fall short to meet the scale, variety, and velocity required by modern AI/ML workflows. Apache Iceberg steps in as a strong open-source table format to build solid and efficient data lakes for AI and ML.
Click through for a primer on Iceberg, how to set up a fairly simple data lake, and some functionality that can help in model training.
Comments closedNikola Ilic performs a comparison:
Before we proceed, an important disclaimer: the guidance I’m providing here is based on both my experience with implementing Microsoft Fabric in real-world scenarios, and the recommended practices provided by Microsoft.
Please keep in mind that the guidance relies on general recommended practices (I intentionally avoid using the phrase best practices, because the best is very hard to determine and agree on). The word general means that the practice I recommend should be used in most of the respective scenarios, but there will always be edge cases when the recommended practice is simply not the best solution. Therefore, you should always evaluate whether the general recommended practice makes sense in your specific use case.
Click through for a comparison between three engines: the lakehouse, the warehouse, and the eventhouse. It would really simplify things if the lakehouse and warehouse combined into one coherent whole.
Comments closedJon Vöge does a bit of organization:
A topic which seems more relevant than ever, is the question of how to organize the contents of your Microsoft Fabric Platform.
Through the contents of a few blogs, I will give you an overview of things to consider, as well as suggestions that you can choose from when designing your platform.
This first week, we’ll take a look at Domains in Microsoft Fabric.
Read on to understand why domains can be valuable and a solid way to structure them.
Comments closed