Press "Enter" to skip to content

Author: Kevin Feasel

Reading Delta Tables from Power BI via Synapse Serverless

Dan English is up for a data lake change:

In this post I just wanted to show the ability to use the Delta Lake format that is very common now with Power BI. I will go over a quick example of creating the files to reference, building a view to use with Power BI, and then querying the data in Power BI.

In my Synapse Workspace I created a Linked service connection to an Azure SQL Database that has the AdventureWorksLT database loaded which is the sample database you can create when you first create a SQL instance in Azure and here is a walkthrough link and see the Additional settings section.

Dan shows how to create the lake files in delta format via Synapse pipeline and then how to query the data from there.

Comments closed

Query Plans from Incomplete Executions

Erik Darling needs information:

One thing that comes up pretty often when you ask for an actual execution plan, is that the query “never finishes” and one can’t be captured.

Good news! There are ways for you to get execution plans in progress with some of the actual query plan elements inside.

You don’t have to let the query run to completion, but generally if you give it a few minutes you can capture where things are going wrong.

Read on to see how, as well as the limitations of each technique.

Comments closed

Named Entity Encryption in Spark

Arshad Ali wants to secure some data being used in a Synapse Spark pool:

As a data engineer, we often get requirements to encrypt, decrypt, mask, or anonymize certain columns of data in files sitting in the data lake when preparing and transforming data with Apache Spark. The extensibility feature of Spark allows us to leverage a library which is not native to Spark. One such library is Microsoft Presidio, which provides fast identification and anonymization modules for private entities in text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data, and more. It facilitates both fully automated and semi-automated PII (Personal Identifiable Information) de-identification and anonymization flows on multiple platforms.

In this blog post, I am going to demonstrate step by step how to download and use this library to meet the above requirements with Spark pool of Azure Synapse Analytics.

Read on to see how it works.

Comments closed

Cumulative Costs and Scale

Michael J. Swart talks about costs:

Think of this another way. The triangle above is a graph of the amount of data you have over time. And if you pay for storage as an operational expense such as when you’re renting storage in the cloud (as opposed to purchasing physical drives). Then the cost of storage is the area of the graph. The monthly bills are ever-increasing, so half of the total cost of storage will be for the most recent 29%.

Put yet another way: If you started creating a large file in the cloud every day since March 2014, then the amount you paid to the cloud provider before the pandemic started is the same amount you paid after the pandemic started (as of August 2022).

I was told there would be no math here.

Comments closed

Thoughts on Publishing a Powershell Module

Shane O’Neill reflects on pushing a Powershell module into the Gallery:

I’ve worked with PowerShell for years but have never published a module before. I’ve helped write changes to a few, e.g. dbatoolsdbachecks, and a few internal ones.

But the actual creating and publishing one needs adding to my experience list.

There was a lot of gnashing of the teeth, wailing of the cries, and reading of the documentation.

Read on for a few tips from Shane. You’ll probably still need to read the documentation…but maybe not as much?

Comments closed

Direct Permission is Just the Start

Kenneth Fisher has access to many permissions:

What you have access to is not just what you have direct permissions to. The other day I needed to copy some backups from one location to another. Unfortunately my network id doesn’t have access to either location. Guess what does though. The service account running the SQL Server instance where the backups were taken. Now, since I’m a sysadmin on that instance when I use xp_cmdshell it uses that service account. I don’t have to know the password or log in as the service account, xp_cmdshell will do it for me.

Click through to learn more.

Comments closed

SQL Server Non-Vulnerabilities

Sean Gallardy has an A+++ 10/10 would read again rant:

I get asked if I know anything about <newest SQL vulnerability as reported by random website>, quite often. Generally, my answer is that I don’t for two main reasons… the first being that none of them are actual vulnerabilities, and the second being that none of them are particularly new but merely items from the same bag of tricks everyone else uses and isn’t a buffer overrun/privilege escalation/etc. item. My normal response after taking a quite peek at whatever article is referenced is generally the same response as The Dude, “Yeah, well, you know, that’s just like uh, your opinion, man.”, as all of these items are purported to be vulnerabilities but yet none actually exploit any vulnerability.

Did you know that if you steal someone’s username and password from the sticky note on their monitor, you can use that to connect to a SQL Server? Amazing vulnerability there—it doesn’t even check that you’re the real person who should have those credentials!

Comments closed

Debugging Code in R

Cosima Meyer explains how debugging works in R with RStudio:

Three basic commands in RStudio let you do the debugging: debug(function_name)browser(), and undebug(function_name).

With debug(function_name) you start the debugging of your function – it’s basically like a mole that digs in. When you’re in debug mode, you can also call the objects in your function.

Read the whole thing to learn the power of debugging beyond the print() statement. H/T R-Bloggers.

Comments closed

Variable Definition and Programmatic ggplot2

Sebastian Sauer takes us through an interesting scenario.

No lede here because it’s almost 100% code and headers. A quick description of this is that we can see ways to parse columns in an R DataFrame and plot visuals without hard-coding the column name in our plot definition, using a variable instead.

And I had to rewrite the synopsis above because I used the data science term “variable” until hitting a wall when describing the programming term “variable.”

Comments closed

Diagnosing High CPU on SQL Server when It’s SQL Server’s Fault

Ajay Dwivedi continues a series on high CPU utilization:

In the last blog post Live Troubleshooting High CPU On SQL Server – Part 1, we worked on a scenario where we saw a high CPU on SQL Server due to some external OS level task. In this blog, we are going to explore a scenario where a high CPU issue is present because of the workload running on SQL Server.

Just like in the last blog post scenario, when I have to troubleshoot a “slow” SQL Server, if my SQL Server is baselined with the SQLMonitor tool, then I first visit my Monitoring – Live – All Servers dashboard which displays all the metrics of specific SQL Servers that need DBA help.

As normal, we see a scenario with SQLMonitor, as well as other options including sp_BlitzFirst.

Comments closed