Press "Enter" to skip to content

Day: October 7, 2022

Named Entity Encryption in Spark

Arshad Ali wants to secure some data being used in a Synapse Spark pool:

As a data engineer, we often get requirements to encrypt, decrypt, mask, or anonymize certain columns of data in files sitting in the data lake when preparing and transforming data with Apache Spark. The extensibility feature of Spark allows us to leverage a library which is not native to Spark. One such library is Microsoft Presidio, which provides fast identification and anonymization modules for private entities in text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data, and more. It facilitates both fully automated and semi-automated PII (Personal Identifiable Information) de-identification and anonymization flows on multiple platforms.

In this blog post, I am going to demonstrate step by step how to download and use this library to meet the above requirements with Spark pool of Azure Synapse Analytics.

Read on to see how it works.

Comments closed

Cumulative Costs and Scale

Michael J. Swart talks about costs:

Think of this another way. The triangle above is a graph of the amount of data you have over time. And if you pay for storage as an operational expense such as when you’re renting storage in the cloud (as opposed to purchasing physical drives). Then the cost of storage is the area of the graph. The monthly bills are ever-increasing, so half of the total cost of storage will be for the most recent 29%.

Put yet another way: If you started creating a large file in the cloud every day since March 2014, then the amount you paid to the cloud provider before the pandemic started is the same amount you paid after the pandemic started (as of August 2022).

I was told there would be no math here.

Comments closed

Thoughts on Publishing a Powershell Module

Shane O’Neill reflects on pushing a Powershell module into the Gallery:

I’ve worked with PowerShell for years but have never published a module before. I’ve helped write changes to a few, e.g. dbatoolsdbachecks, and a few internal ones.

But the actual creating and publishing one needs adding to my experience list.

There was a lot of gnashing of the teeth, wailing of the cries, and reading of the documentation.

Read on for a few tips from Shane. You’ll probably still need to read the documentation…but maybe not as much?

Comments closed

Direct Permission is Just the Start

Kenneth Fisher has access to many permissions:

What you have access to is not just what you have direct permissions to. The other day I needed to copy some backups from one location to another. Unfortunately my network id doesn’t have access to either location. Guess what does though. The service account running the SQL Server instance where the backups were taken. Now, since I’m a sysadmin on that instance when I use xp_cmdshell it uses that service account. I don’t have to know the password or log in as the service account, xp_cmdshell will do it for me.

Click through to learn more.

Comments closed

SQL Server Non-Vulnerabilities

Sean Gallardy has an A+++ 10/10 would read again rant:

I get asked if I know anything about <newest SQL vulnerability as reported by random website>, quite often. Generally, my answer is that I don’t for two main reasons… the first being that none of them are actual vulnerabilities, and the second being that none of them are particularly new but merely items from the same bag of tricks everyone else uses and isn’t a buffer overrun/privilege escalation/etc. item. My normal response after taking a quite peek at whatever article is referenced is generally the same response as The Dude, “Yeah, well, you know, that’s just like uh, your opinion, man.”, as all of these items are purported to be vulnerabilities but yet none actually exploit any vulnerability.

Did you know that if you steal someone’s username and password from the sticky note on their monitor, you can use that to connect to a SQL Server? Amazing vulnerability there—it doesn’t even check that you’re the real person who should have those credentials!

Comments closed