Press "Enter" to skip to content

Curated SQL Posts

Query Plans from Incomplete Executions

Erik Darling needs information:

One thing that comes up pretty often when you ask for an actual execution plan, is that the query “never finishes” and one can’t be captured.

Good news! There are ways for you to get execution plans in progress with some of the actual query plan elements inside.

You don’t have to let the query run to completion, but generally if you give it a few minutes you can capture where things are going wrong.

Read on to see how, as well as the limitations of each technique.

Comments closed

Reading Delta Tables from Power BI via Synapse Serverless

Dan English is up for a data lake change:

In this post I just wanted to show the ability to use the Delta Lake format that is very common now with Power BI. I will go over a quick example of creating the files to reference, building a view to use with Power BI, and then querying the data in Power BI.

In my Synapse Workspace I created a Linked service connection to an Azure SQL Database that has the AdventureWorksLT database loaded which is the sample database you can create when you first create a SQL instance in Azure and here is a walkthrough link and see the Additional settings section.

Dan shows how to create the lake files in delta format via Synapse pipeline and then how to query the data from there.

Comments closed

Named Entity Encryption in Spark

Arshad Ali wants to secure some data being used in a Synapse Spark pool:

As a data engineer, we often get requirements to encrypt, decrypt, mask, or anonymize certain columns of data in files sitting in the data lake when preparing and transforming data with Apache Spark. The extensibility feature of Spark allows us to leverage a library which is not native to Spark. One such library is Microsoft Presidio, which provides fast identification and anonymization modules for private entities in text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data, and more. It facilitates both fully automated and semi-automated PII (Personal Identifiable Information) de-identification and anonymization flows on multiple platforms.

In this blog post, I am going to demonstrate step by step how to download and use this library to meet the above requirements with Spark pool of Azure Synapse Analytics.

Read on to see how it works.

Comments closed

Cumulative Costs and Scale

Michael J. Swart talks about costs:

Think of this another way. The triangle above is a graph of the amount of data you have over time. And if you pay for storage as an operational expense such as when you’re renting storage in the cloud (as opposed to purchasing physical drives). Then the cost of storage is the area of the graph. The monthly bills are ever-increasing, so half of the total cost of storage will be for the most recent 29%.

Put yet another way: If you started creating a large file in the cloud every day since March 2014, then the amount you paid to the cloud provider before the pandemic started is the same amount you paid after the pandemic started (as of August 2022).

I was told there would be no math here.

Comments closed

Thoughts on Publishing a Powershell Module

Shane O’Neill reflects on pushing a Powershell module into the Gallery:

I’ve worked with PowerShell for years but have never published a module before. I’ve helped write changes to a few, e.g. dbatoolsdbachecks, and a few internal ones.

But the actual creating and publishing one needs adding to my experience list.

There was a lot of gnashing of the teeth, wailing of the cries, and reading of the documentation.

Read on for a few tips from Shane. You’ll probably still need to read the documentation…but maybe not as much?

Comments closed

Direct Permission is Just the Start

Kenneth Fisher has access to many permissions:

What you have access to is not just what you have direct permissions to. The other day I needed to copy some backups from one location to another. Unfortunately my network id doesn’t have access to either location. Guess what does though. The service account running the SQL Server instance where the backups were taken. Now, since I’m a sysadmin on that instance when I use xp_cmdshell it uses that service account. I don’t have to know the password or log in as the service account, xp_cmdshell will do it for me.

Click through to learn more.

Comments closed

SQL Server Non-Vulnerabilities

Sean Gallardy has an A+++ 10/10 would read again rant:

I get asked if I know anything about <newest SQL vulnerability as reported by random website>, quite often. Generally, my answer is that I don’t for two main reasons… the first being that none of them are actual vulnerabilities, and the second being that none of them are particularly new but merely items from the same bag of tricks everyone else uses and isn’t a buffer overrun/privilege escalation/etc. item. My normal response after taking a quite peek at whatever article is referenced is generally the same response as The Dude, “Yeah, well, you know, that’s just like uh, your opinion, man.”, as all of these items are purported to be vulnerabilities but yet none actually exploit any vulnerability.

Did you know that if you steal someone’s username and password from the sticky note on their monitor, you can use that to connect to a SQL Server? Amazing vulnerability there—it doesn’t even check that you’re the real person who should have those credentials!

Comments closed

Variable Definition and Programmatic ggplot2

Sebastian Sauer takes us through an interesting scenario.

No lede here because it’s almost 100% code and headers. A quick description of this is that we can see ways to parse columns in an R DataFrame and plot visuals without hard-coding the column name in our plot definition, using a variable instead.

And I had to rewrite the synopsis above because I used the data science term “variable” until hitting a wall when describing the programming term “variable.”

Comments closed

Debugging Code in R

Cosima Meyer explains how debugging works in R with RStudio:

Three basic commands in RStudio let you do the debugging: debug(function_name)browser(), and undebug(function_name).

With debug(function_name) you start the debugging of your function – it’s basically like a mole that digs in. When you’re in debug mode, you can also call the objects in your function.

Read the whole thing to learn the power of debugging beyond the print() statement. H/T R-Bloggers.

Comments closed

Kernel SHAP in R and Python

Michael Mayer and Christian Lorentzen team up:

SHAP is one of the most used model interpretation technique in Machine Learning. It decomposes predictions into additive contributions of the features in a fair way. For tree-based methods, the fast TreeSHAP algorithm exists. For general models, one has to resort to computationally expensive Monte-Carlo sampling or the faster Kernel SHAP algorithm. Kernel SHAP uses a regression trick to get the SHAP values of an observation with a comparably small number of calls to the predict function of the model. Still, it is much slower than TreeSHAP.

Read on to see how to do this in both R and Python. With libraries the way they are, the code is very similar and the results are basically the same.

Comments closed