Press "Enter" to skip to content

Curated SQL Posts

Multi-Threading with dbatools

Andy Levy has some lessons learned:

Over the summer, I spent some (a lot of) time working on updates to a script at work which runs multiple processes in parallel. Everything seemed to work OK for a while, but then everything broke. It broke right around the time dbatools 1.1 dropped, so I started thinking that something must have changed there. Turns out, it was entirely my fault and I hope this post will help you avoid the same trap.

Don’t fall into the same traps Andy did; read the whole thing.

Comments closed

System-Versioned Ledger Tables

Randolph West has a series on ledger tables in SQL Server. First up is a primer on the topic:

System-versioned ledger tables leverage the same technology: there is a table with current data in it, and an underlying history table which keeps track of changes. However, it uses a cryptographic chain that provides digital forensic evidence of tampering. Yes, if you’ll pardon the use of this phrase, I’m talking about a blockchain.

This is not a cryptocurrency. No one is using expensive graphics cards to produce a fiat currency in someone’s basement. Instead, each transaction affecting the database in question is cryptographically hashed using a SHA-256 algorithm and then stored somewhere off-site.

Part two separates out the two types of ledger table:

This week we will look at the different types of ledger table: append-only and updatable.

Unlike temporal tables, a ledger table can be append-only which makes it immutable. You can only insert data and therefore it does not need a history table. In fact, you may be using append-only tables in your data warehouse already. While this is secure, it may not be practical.

Part three covers limitations:

Every choice we make is a trade-off. New features have limitations, and ledger tables are no exception.

Some of these limitations are perfectly sensible. For example, the whole point of ledger tables is to ensure that we can provide tamper evidence. This necessarily means you can’t turn it off once it’s enabled, unless you drop the database entirely — this is just one scenario where a full defence-in-depth strategy is required.

Part four is the one I’ve been waiting for—an explanation why you probably don’t need this:

After writing several posts about a neat feature in Azure SQL called system-versioned ledger tables, it reminded me about something I’ve wanted to say for a number of years now, outside of snarky tweets.

Here goes:

You don’t need a blockchain.

In the vast majority of use cases, you need a properly audited relational database system with ACID compliance and a good recovery strategy.

There are very specific use cases in which data hashes and ledger tables make sense.

Comments closed

Adding an Animated GIF to Power BI Reports

Ed Hansberry works around a Power BI limitation:

It is easy to add an animated GIF to your Power BI Reports. However, if you just add it as an image, it won’t animate. You’ll just get a static image.

Animated GIFs can be useful in reports to explain new features to users, or on hidden developer pages showing how certain things were done.

Click through for instructions on how to include an animated GIF on your Power BI report. Just make sure to pronounce it the right way.

Comments closed

More Efficient Pivoting

Dave Mason is on the hunt:

While working with some poorly performing code in T-SQL that used a PIVOT operator, I wondered if there was a more efficient way to get a result set of pivoted data. It may have been a fool’s errand, but I still wanted to try. It dawned on me that I could use the STRING_AGG() function to build a delimited list of pivoted column names and values. From there, I’d have to “shred” the delimited data to rows and columns. But how?

Read on to see how.

Comments closed

Compressing Multiple Files into an Archive with Powershell

Patrick Gruenauer zips all the files:

Compressing files is a common task. For saving time, you can use PowerShell to automate the compression process. In this blog post I will show you how to compress multiple folders at once with the PowerShell Cmdlet Compress-Archive. The compressed zip files will be stored separately in a specific file. Let’s dive in.

Click through for a script which compresses each folder in its own zip file.

Comments closed

Push-Based Shuffle in Apache Spark 3.2 via Project Magnet

Venkata Krishnan Sowrirajan and Min Shen announce that Project Magnet will be in Apache Spark 3.2:

Push-based shuffle is an implementation of shuffle where the shuffle blocks are pushed to the remote shuffle services from the mapper tasks in order to address shuffle scalability and reliability issues. In a nutshell, with push-based shuffle, a large number of small, random reads is converted into a small number of large, sequential reads, which significantly improves disk I/O efficiency and shuffle data locality.

This is explained in greater detail in an earlier blog post, Magnet: A scalable and performant shuffle architecture for Apache Spark, which you can read for more information about how we achieve push-based shuffle.

Read on to see when this matters and how you can make use of it once you’re in Spark 3.2 (whose first release was exactly two weeks ago, October 13th).

Comments closed

Caching Strategies with Redis

Camilo Reyes shares performance data from four Redis caching strategies:

Redis is a cache database that stores documents in memory. The data store has a key-value pair lookup with O(1) time complexity. This makes the cache fast and convenient because it does not have to deal with complex execution plans to get data. The cache service can be trusted to find a cache entry with a value in almost no time.

When datasets in cache begin to grow, it can be surprising to realize that any latency is not a Redis issue. In this take, I will show you several strategies to cache Redis data structures then show what you can do to pick the best approach.

Read on for the contenders and how they do. ProtoBuf’s results on small datasets surprised me.

Comments closed

Connection Leaks with MARS

Josh Darnell warns that, if you go to MARS, a doctor will warn you that you have a schizoid embolism and it will be up to you to determine whether the doctor is lying or not:

I recently looked at a SQL Server instance that had a large number of MARS connections under a single “parent” connection. Most of these “child” connections had been idle for quite a while, but they were still hanging around.

Read the whole thing. Because I’ve used MARS so little, I’ll instead add a follow-up point to my Total Recall reference above. In the commentary track for the movie, director Paul Verhoeven notes that Douglas Quaid actually did die and that it really was just a fantasy concocted in his mind and he really did die at the end. You can tell because, instead of a fade to black like normal movies, he fades to white, indicating that this wasn’t a proper ending. But then again, considering the follow-on media which happened (and was slated to happen but didn’t make it to the finish line), I don’t think the studios would have let Verhoeven keep his unhappy ending.

Comments closed