Cloud – Page 83 – Curated SQL

System-Versioned Ledger Tables

Published 2021-10-28 by Kevin Feasel

Randolph West has a series on ledger tables in SQL Server. First up is a primer on the topic:

System-versioned ledger tables leverage the same technology: there is a table with current data in it, and an underlying history table which keeps track of changes. However, it uses a cryptographic chain that provides digital forensic evidence of tampering. Yes, if you’ll pardon the use of this phrase, I’m talking about a blockchain.
This is not a cryptocurrency. No one is using expensive graphics cards to produce a fiat currency in someone’s basement. Instead, each transaction affecting the database in question is cryptographically hashed using a SHA-256 algorithm and then stored somewhere off-site.

Part two separates out the two types of ledger table:

This week we will look at the different types of ledger table: append-only and updatable.
Unlike temporal tables, a ledger table can be append-only which makes it immutable. You can only insert data and therefore it does not need a history table. In fact, you may be using append-only tables in your data warehouse already. While this is secure, it may not be practical.

Part three covers limitations:

Every choice we make is a trade-off. New features have limitations, and ledger tables are no exception.
Some of these limitations are perfectly sensible. For example, the whole point of ledger tables is to ensure that we can provide tamper evidence. This necessarily means you can’t turn it off once it’s enabled, unless you drop the database entirely — this is just one scenario where a full defence-in-depth strategy is required.

Part four is the one I’ve been waiting for—an explanation why you probably don’t need this:

After writing several posts about a neat feature in Azure SQL called system-versioned ledger tables, it reminded me about something I’ve wanted to say for a number of years now, outside of snarky tweets.
Here goes:
You don’t need a blockchain.
In the vast majority of use cases, you need a properly audited relational database system with ACID compliance and a good recovery strategy.

There are very specific use cases in which data hashes and ledger tables make sense.

Comments closed

Azure SQL Managed Instance and tempdb Quirks

Published 2021-10-28 by Kevin Feasel

Erik Darling airs some grievances well ahead of Festivus:

This post is admittedly short, but since we’re short of First Party Solutions™ to gripe to Microsoft with… I need to save this stuff somewhere.

Click through for Erik’s notes and commentary.

Comments closed

Serverless SQL Pool CI/CD

Published 2021-10-27 by Kevin Feasel

Kevin Chant doesn’t have time for manual deployments:

I want to cover one way you can do CI/CD for Azure Synapse Analytics serverless SQL pools using Azure DevOps in this post. Because I know it is a popular topic.
It’s related to my post about how you can create a dacpac for an Azure Synapse Analytics dedicated SQL pool using Azure DevOps. Since they are both based in the same service.
Plus, a while ago I wrote about the increase in demand for Data Platform automation. So, I really wanted to do a post about how you can do CI/CD for Azure Synapse Analytics serverless SQL pools.

Read on to learn how.

Comments closed

Storage Testing for Azure SQL Managed Instances

Published 2021-10-22 by Kevin Feasel

Joe Obbish busts out the slide rule:

Lately I’ve been doing some exploratory performance testing on Azure SQL Managed Instances in preparation for a migration to that platform. This blog post documents some storage testing results and may even have practical advice near the end. All testing was done on a gen5 general purpose instance with 8 vCores.

Read on for Joe’s findings. Spoiler alert: there is practical advice at the end.

Comments closed

Real-Time Change Detection via Cumulative Sums

Published 2021-10-22 by Kevin Feasel

Nithin Sankar tracks deviations with cumulative sums:

With the advent of Internet of Things (IOT) and the proliferation of connected devices, comes the challenge of monitoring parts for maintenance before they break down. A common approach revolves around getting data from connected devices and performing a statistical test to determine the likelihood of the device failing. While this common approach is robust, it typically involves a significant time investment in exploratory data analysis, feature engineering, training, and testing to build a predictive model. It, therefore, often lacks the agility required to keep up with the monitoring demands of increasingly time-sensitive initiatives.
In this context, the question becomes: how can we ensure a similar degree of rigor, but also improve the timeliness and responsiveness of being able to perform predictive maintenance?

Click through for the process, as well as an example using Azure Stream Analytics and Power BI.

Comments closed

Automatic Backups on a Data Lake or Lakehouse

Published 2021-10-22 by Kevin Feasel

Dave Ruijter backs that thing up:

Out of the box, Azure Data Lake Storage Gen2 provides redundant storage. Therefore, the data in your Data Lake(house) is resilient to transient hardware failures within a datacenter through automated replicas. This ensures durability and high availability. In this blog post, I provide a backup strategy on how to further protect your data from accidental deletions, data corruption, or any other data failures. This strategy works for Data Lake as well as Data Lakehouse implementations. It uses native Azure services, no additional tools, software, or licenses are required.

Read on for a detailed strategy.

Comments closed

Optimizing Blob Storage Query Performance

Published 2021-10-22 by Kevin Feasel

Dennes Torres compares several strategies for querying data stored in Azure Blob Storage:

In the third part of the series Querying Blob Storage with SQL, I will focus on the performance behaviour of queries: What makes them faster, slower, and some syntax beyond the basics.
The performance tests in this article are repeated, and the best time of the queries is recorded. This doesn’t mean you will always achieve the same timing. Many architectural details will affect the timing, such as cache, first execution, and so on. The timing exposed on each query is only a reference pointing to the differences of the query methods that can affect the time and the usual result for better or worse performance.

Click through to see which patterns perform well and which don’t.

Comments closed

Scaling an Azure SQL Managed Instance

Published 2021-10-21 by Kevin Feasel

Arun Sirpal wants more power:

No doubt there will be times where you need to scale up the actual instance in terms of vCores but also you may want to move across tiers (for example General Purpose to Business Critical). If you remember a few blog posts ago I said it was really important to plan for these activities during the build phase, more specifically get the subnet range right. If you done that then you will be fine.

Click through for the process, though do note the amount of time it takes. One of the early ideals of cloud processing would be that you could seamlessly scale up and down with no effects on the end user. In some services (especially things like function apps, web apps, and VMs in a Kubernetes pod), you get that experience. When it comes to almost anything data-related, though, immediate scaling is a hard no, to the point where I’d assume you can’t afford the downtime to do it until proven otherwise.

Comments closed

Contrasting Kafka with Azure Service Bus

Published 2021-10-18 by Kevin Feasel

Ritam Das explains the differences between Apache Kafka and Azure Service Bus:

It is important to note that Azure Service Bus is a traditional message broker and tailored to somewhat different use cases when compared to Kafka. Simply transferring between these two technologies is not an easy feat and would require overhauling your entire application. The comparison stops at both technologies being message brokers as under the hood they are fundamentally different.
At a high level, ASB has high processing overhead per message, stronger guarantees around delivery and processing, and typically a “process once” model. Kafka has low overhead processing per message, fewer guarantees around delivery and processing, and typically a “publish once, process multiple times” model. To provide an explicit comparison, it would be best to understand the intended use case and proceed from there.

Read on to understand the best uses for each technology, as well as sample calls using Python.

Comments closed

Creating Delta Lake Tables in Azure Databricks

Published 2021-10-18 by Kevin Feasel

Gauri Mahajan takes us through creating new tables in a Delta Lake using Azure Databricks:

Delta lake is an open-source data format that provides ACID transactions, data reliability, query performance, data caching and indexing, and many other benefits. Delta lake can be thought of as an extension of existing data lakes and can be configured per the data requirements. Azure Databricks has a delta engine as one of the core components that facilitates delta lake format for data engineering and performance. Delta lake format is used to create modern data lake or lakehouse architectures. It is also used to build a combined streaming and batch architecture popularly known as lambda architecture.

Click through for the process.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Category: Cloud