Press "Enter" to skip to content

Month: October 2022

Column Lookups and Dynamic SQL

Rob Farley does a double-check:

I’ve written before about what I consider the golden rule with SQL Injection. And that is, to understand that DATA should not be executed like CODE. A parameter is fine – that’s not part of what gets executed – but you do not embed user-entered values into queries. You never create a WHERE clause like “WHERE t.col1 = 3”, if that value 3 is being passed in. Instead, you use “WHERE t.col1 = @param”. Because you can’t assume that “3” is safe to run.

But there are things in queries that can’t be parameterised. Table names, linked server names, column names, for example.

Read on to learn what Rob does in those cases.

Comments closed

Auto-Aggregation on Dimension Tables: Gone

Matt Allington made the Power BI world a slightly better place:

Have you ever loaded a calendar table with lots of numeric columns (such as Year, Month Number, etc), loaded it into Power BI, and then had to manually go an turn off the default aggregation on these columns?  If you build Power BI reports, then I bet you have done this more than once. Jeffrey’s idea was that Power BI should be smart enough to detect a dimension table (one side of the relationship) and to disable these default aggregations.  There’s nothing worse than adding a Year column to a table and having Power BI add the years together. If I add a year to a table, I want to slice and dice, not SUM the years.

Matt added a request and see where that gets you.

Comments closed

When Batching Breaks Down

Erik Darling has some thoughts on batching data modifications:

The whole point is to go a little bit easier on your server:

– Smaller transactions sent to the log file

– Shorter locking durations

– Less build up of the version store (optimistic isolation levels, triggers, etc.)

But there are thing that make batching impossible, or downright tricky.

Erik provides more details after the jump. This is all “stuff you should think about” rather than “reasons why you shouldn’t batch.”

Comments closed

Creating Power BI Scorecards from Excel

Imke Feldmann has a goal:

I like Power BI scorecards a lot, as they give a really nice interface and look to your and your teams goals. However, entering all goals manually can become a bit tedious, especially if you have your goals already sitting in an Excel file. This is often the case for example with Budget figures. In this article I will share a method using Power Automate to create a scorecard automatically from an Excel file using standard Power Automate connectors. Please note that these connectors are still a bit limited, so connections to Power BI reports for example cannot be created currently.

Click through to see how.

Comments closed

General Purpose Serverless Azure SQL DB Performance

Reitse Eskens doesn’t need a server:

In my previous blog I wrote about the premium tier, the one that can be compared with the business critical tier. Now we’re moving away from the DTU models and back to what we DBA’ers really understand, cores, memory and disks. Before I’m going to dive into the limitations, there’s one thing you need to understand. The serverless tier is made for intermittent use. If you’re using the tier for more than 25% of the time (or about 183 hours per month), you’re better of going provisioned. This has nothing to do with performance but everything with cost. The tipping point of provisioned being cheaper is around 25% of the time.

There’s some solid advice on how to get the service to go to sleep but the bulk of the article revolves around performance.

Comments closed

Live Connection vs DirectQuery in Power BI

Chris Webb disambiguates a couple of terms:

It’s very easy to get confused between a Live connections and DirectQuery mode in Power BI: the customers I work with frequently get them mixed up, partly because they are similar concepts, partly because other BI tools use the term “live connection” to refer to what Power BI calls DirectQuery mode. In this post I will explain the differences between these two terms.

Read on for that explanation.

Comments closed

Failed to Update Server Firewall Rules in Azure

Andy Leonard gets an error:

Recently, I attempted to update the Networking for an Azure SQL Server I use when delivering training. Specifically, I was attempting to set “Public network access” to “Selected networks,” add my client IP, and check the “Allow Azure services and resources to access this server” checkbox – found on the Networking blade for my new Azure server:

Fortunately, this was a case of “read the actual error message, understand the actual error.” It doesn’t always work out that way but it’s nice when it does.

Comments closed

Query Plans from Incomplete Executions

Erik Darling needs information:

One thing that comes up pretty often when you ask for an actual execution plan, is that the query “never finishes” and one can’t be captured.

Good news! There are ways for you to get execution plans in progress with some of the actual query plan elements inside.

You don’t have to let the query run to completion, but generally if you give it a few minutes you can capture where things are going wrong.

Read on to see how, as well as the limitations of each technique.

Comments closed

Reading Delta Tables from Power BI via Synapse Serverless

Dan English is up for a data lake change:

In this post I just wanted to show the ability to use the Delta Lake format that is very common now with Power BI. I will go over a quick example of creating the files to reference, building a view to use with Power BI, and then querying the data in Power BI.

In my Synapse Workspace I created a Linked service connection to an Azure SQL Database that has the AdventureWorksLT database loaded which is the sample database you can create when you first create a SQL instance in Azure and here is a walkthrough link and see the Additional settings section.

Dan shows how to create the lake files in delta format via Synapse pipeline and then how to query the data from there.

Comments closed

Named Entity Encryption in Spark

Arshad Ali wants to secure some data being used in a Synapse Spark pool:

As a data engineer, we often get requirements to encrypt, decrypt, mask, or anonymize certain columns of data in files sitting in the data lake when preparing and transforming data with Apache Spark. The extensibility feature of Spark allows us to leverage a library which is not native to Spark. One such library is Microsoft Presidio, which provides fast identification and anonymization modules for private entities in text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data, and more. It facilitates both fully automated and semi-automated PII (Personal Identifiable Information) de-identification and anonymization flows on multiple platforms.

In this blog post, I am going to demonstrate step by step how to download and use this library to meet the above requirements with Spark pool of Azure Synapse Analytics.

Read on to see how it works.

Comments closed