2021-10-27 – Curated SQL

Push-based shuffle is an implementation of shuffle where the shuffle blocks are pushed to the remote shuffle services from the mapper tasks in order to address shuffle scalability and reliability issues. In a nutshell, with push-based shuffle, a large number of small, random reads is converted into a small number of large, sequential reads, which significantly improves disk I/O efficiency and shuffle data locality.
This is explained in greater detail in an earlier blog post, Magnet: A scalable and performant shuffle architecture for Apache Spark, which you can read for more information about how we achieve push-based shuffle.

Read on to see when this matters and how you can make use of it once you’re in Spark 3.2 (whose first release was exactly two weeks ago, October 13th).

Comments closed

Scripting Hive Tables

Published 2021-10-27 by Kevin Feasel

The Hadoop in Real World team shows how you can generate the DDL to create an existing Hive table:

If you have worked with tools like Toad, DB Visualizer or SQL Server Management Studio you know it is quite easy to select an existing table and create a DDL script for the table to get the create script. How do we do the same or get the DDL or create script of an existing Hive table?

The syntax is quite similar to MySQL.

Comments closed

Caching Strategies with Redis

Published 2021-10-27 by Kevin Feasel

Camilo Reyes shares performance data from four Redis caching strategies:

Redis is a cache database that stores documents in memory. The data store has a key-value pair lookup with O(1) time complexity. This makes the cache fast and convenient because it does not have to deal with complex execution plans to get data. The cache service can be trusted to find a cache entry with a value in almost no time.
When datasets in cache begin to grow, it can be surprising to realize that any latency is not a Redis issue. In this take, I will show you several strategies to cache Redis data structures then show what you can do to pick the best approach.

Read on for the contenders and how they do. ProtoBuf’s results on small datasets surprised me.

Comments closed

Connection Leaks with MARS

Published 2021-10-27 by Kevin Feasel

Josh Darnell warns that, if you go to MARS, a doctor will warn you that you have a schizoid embolism and it will be up to you to determine whether the doctor is lying or not:

I recently looked at a SQL Server instance that had a large number of MARS connections under a single “parent” connection. Most of these “child” connections had been idle for quite a while, but they were still hanging around.

Read the whole thing. Because I’ve used MARS so little, I’ll instead add a follow-up point to my Total Recall reference above. In the commentary track for the movie, director Paul Verhoeven notes that Douglas Quaid actually did die and that it really was just a fantasy concocted in his mind and he really did die at the end. You can tell because, instead of a fade to black like normal movies, he fades to white, indicating that this wasn’t a proper ending. But then again, considering the follow-on media which happened (and was slated to happen but didn’t make it to the finish line), I don’t think the studios would have let Verhoeven keep his unhappy ending.

Comments closed

Sorting a List with Powershell

Published 2021-10-27 by Kevin Feasel

Kenneth Fisher lines ’em up and knocks ’em down:

In my last post I grabbed a file list but I really need it sorted alphabetically. Now, I’ve been a DBA my entire career and one of the things that gets hammered into you is that unless you specifically sort a list, there is no guarantee that that list will be sorted. So how do I sort my list?

To learn how and see a few examples of it in action, check out Kenneth’s post.

Comments closed

Serverless SQL Pool CI/CD

Published 2021-10-27 by Kevin Feasel

Kevin Chant doesn’t have time for manual deployments:

I want to cover one way you can do CI/CD for Azure Synapse Analytics serverless SQL pools using Azure DevOps in this post. Because I know it is a popular topic.
It’s related to my post about how you can create a dacpac for an Azure Synapse Analytics dedicated SQL pool using Azure DevOps. Since they are both based in the same service.
Plus, a while ago I wrote about the increase in demand for Data Platform automation. So, I really wanted to do a post about how you can do CI/CD for Azure Synapse Analytics serverless SQL pools.

Read on to learn how.

Comments closed

Lessons Learned in Migrating to .NET 5 or 6

Published 2021-10-27 by Kevin Feasel

Patrick Smacchia has a few tips for migrating code from .NET Framework to .NET 5 or even 6:

In January 2020 I wrote the post Not planning now to migrate your .NET 4.8 legacy, is certainly a mistake. Hopefully we followed our own advice and have been migrated most of our non-UI code. This way latest NDepend version 2021.2 can now run analysis, reporting, power tools and API against .NET 5 on Windows, Linux and MacOS.
We learn a few things during this migration journey. Let me expose those in five points:

My most positive experiences with this have come in migrating projects with relatively few third party dependencies. The big problem there is that a fair percentage of older libraries never made the leap to Standard, so you may be stuck with a re-write (or just stuck in general) as a result.

Comments closed

Dynamic Parameter Code in Powershell

Published 2021-10-27 by Kevin Feasel

Jeffrey Hicks shows off some Powershell 7 functionality:

One of the topics we’ve discussed in the PowerShell Cmdlet Working Group is a request to make it easier to insert dynamic parameters. I am a bit torn on this. On one hand, I see the value in dynamic parameters. These are parameters that only exist if some condition is met, such as if the current location is in the Windows registry. The downside is that these parameters are difficult to discover and awkward to document. On top of that, the PowerShell code necessary to define a dynamic parameter is complicated and definitely not beginner-level. This is what I think the issue is really all about. So I decided to write my own tooling to make it easier to insert dynamic parameters.

Some of those examples go from “This looks reasonable” to “That’s a lot of code” pretty quickly. In fairness, though, this isn’t the type of thing you’ll write every day.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Day: October 27, 2021

Push-Based Shuffle in Apache Spark 3.2 via Project Magnet

Scripting Hive Tables

Caching Strategies with Redis

Connection Leaks with MARS

Sorting a List with Powershell

Serverless SQL Pool CI/CD

Lessons Learned in Migrating to .NET 5 or 6

Dynamic Parameter Code in Powershell