Press "Enter" to skip to content

Curated SQL Posts

The Performance Pain of User-Defined Functions

Tom Zika continues a series on why user-defined scalar functions are such a bad idea:

I’ll cover several test scenarios and analyze the performance using different monitoring tools.
The results will be for the second executions of the queries, so we have compiled and cached plans and all pages in the buffer pool.

Even if you already hate seeing scalar user-defined functions in code, the occasional reminder of how poorly they perform helps focus the mind.

Comments closed

Object-Level Security in Power BI and Analysis Services

Marco Russo and Alberto Ferrari take advantage of object-level security:

To be more technical, object-level security can be applied to tables and columns, but it cannot be applied to measures directly. If a measure – whether in a direct or indirect way – references a column or a table that is not accessible under the current security context, the measure becomes invisible as well. This way, it is guaranteed that if a column must be invisible to a group of users, its content cannot be inferred by looking at the result of measures based on said column.

Concretely, there may be cases where a measure should be hidden from a group of users without removing the visibility of existing data structures. Let us look at a simple example first. We define the Sales Amount measure as the product of Sales[Quantity] by Sales[Price]. You also have a Discounted Sales measure that applies a set discount Sales Amount; now how can you hide Discounted Sales from a group of users without hiding the initial Sales Amount measure? By hiding either Sales[Quantity] or Sales[Price], you would hide both measures. Because the discount is set inside the Discounted Sales measure and not stored in the model, it looks as though you cannot hide just the measure. However, it we create a dependency in Discounted Sales on an empty hidden table specifically created to generate that dependency, we can hide Discounted Sales by hiding that table.

Read on to see how.

Comments closed

Measuring Power Query CPU Utilization

Chris Webb does a refresh and tracks the damage done:

Some time ago I wrote a post about how optimising for CPU Time is almost as important as optimising for Duration in Power BI, especially if you’re working with Power BI Premium Gen2. This is fairly straightforward if you’re optimising DAX queries or optimising Analysis Services engine-related activity for refreshes. But what about Power Query-related activity? You may have a small dataset but if you’re doing a lot of complex transformations in Power Query that could end up using a lot of CPU, even once the CPU smoothing for background activity that happens with Premium Gen2 has happened. How can you measure how expensive your Power Query queries are in terms of CPU? In this post I’ll show you how.

Read on to see how you can see how much CPU is required to do that dataset refresh.

Comments closed

Building a “Solution Summary” Card for Developers

Ben Brown gets the new developers up to speed:

Solution Summary cards are the name I have given to simple 1-page documents we provide to developers to get up-to-speed on a particular solution.

The primary use-case for these cards is to provide better on-boarding for new team members.

An additional benefit is that crafting these short summaries forces people to review how your teams are working and you will often start seeing areas which would benefit from standardization or simplification.   

Click through for an example. I like this idea a lot for a few reasons, which I’ll enumerate here because I’m feeling it:

  1. Developers often won’t read really long documents. They might scan through the documents, Ctrl+F a few words, and refer back to it on occasion but rarely will they read through a document in its entirety.
  2. Long documents get out of date too quickly, as developers are typically bad at keeping documentation up to date, especially as deadlines loom.
  3. An artificial constraint like “Your summary must fit on one printed page” (either front-only or front-and-back) helps focus the mind. It makes you think about what the most important things to include are.
  4. If you are a consultant or just work on a lot of different projects, it can be painful trying to remember the details. Which jump box do you need to go to? Which were the most important bits of code? How do you run the unit tests and deploy the app locally? Even if nobody else ever sees the card, it can help you in the future.
Comments closed

Summarizing Data & AI Summit Announcements

Zach Stagers hits the high notes:

One of the biggest cheers of the keynote was that Delta is being fully open sourced! Databricks continue to share their incredible work to help drive our industry forward. Delta already has wide adoption, but with the open sourced version now being levelled up to the same standard as the ‘proprietary’ one, this should help cement it as the default choice for lake-based storage.

There were some announcements of things to come with Delta too, such as a optimised deletes and updates by removing single rows instead of having to completely rewrite the file. It’ll be really interesting to see how this works, and just how much it boosts performance.

Read on for more notes on several big announcements.

Comments closed

Databases, Applications, and Source Control Repos

Eitan Blumin asks and answers a question:

Following the rise in popularity of DevOps for Databases, many interesting questions are being asked on the topic.

One of these questions is: Should your SQL Database project be in the same source control repository and solution as the App code project? Or maybe they should be in the same repository but separate solutions? Or maybe they should be in completely separate repositories?

Pre-registering my answer here: for most organizations, databases should be in a separate repository. The deployment cadence is different, the deployment mechanism is different, and the people working on each likely differ. Read on for Eitan’s thoughts, which get into more of the nuance behind the answer.

Comments closed

Azure Data Explorer Web Updates

Michal Bar has a few updates to the Azure Data Explorer web tool:

We are focused on continuously improving the results exploration experience in ADX web UI, to make it easy and intuitive. Our goal is to provide an easy-to-use UI so that you will not be required to re-write KQL queries in order to perform light-weight data exploration.

Click through to see how you can search and filter within the results pane (something I’d like to see in other Microsoft data platform tools like SSMS), create series panels on charts from KQL, and more.

Comments closed

Filling in GitHub Repo Details

Kevin Chant practices GitHub hygeine:

To clarify, GitHub hygiene is a term that I use to describe the practice of keeping GitHub repositories healthy.

Some of you have probably noticed I have been doing this more recently. With this in mind, I thought I would share what I have been doing in this post for a couple of reasons.

First of all, to help raise awareness about some of the best practices I have been doing.

Secondly, because I am interested to get feedback from other members of the Microsoft Data Platform community about this. For example, do you also follow the same practices?

This is a reminder that there’s a lot you can include in a GitHub repo aside from the code itself.

Comments closed

More Notes on Parameter Sensitive Plan Optimization

Erik Darling continues a series on Parameter Sensitive Plan Optimization. First up is a missed opportunity:

If we re-run the procedure from the section up above to search for VoteTypeId 2 a second time, Memory Grant Feedback will fix the spill at the Hash Join, and bring the total execution time down to about 15 seconds.

That is an improvement, but… Look at the plan here. If VoteTypeId 2 uses a plan more suited to the number of rows it has to process, the overall time is around 4 seconds, with no need for a memory grant correction.

Second, Erik asks the pressing questions:

Under more favorable circumstances, dynamic SQL gets run, executed, and plans cached and reused with the same frequency as stored procedures.

Now, dynamic SQL isn’t exactly the same as stored procedures. There’s a lot you can do with those that just looks a mess in dynamic SQL, especially longer bits of code.

In today’s post, we’re going to look at how the Parameter Sensitive Plan (PSP) optimization works with dynamic SQL.

Read on to see if PSP works with dynamic SQL.

Comments closed

Tools for a Jump Box

Tracy Boggiano looks in the tool bag:

Having recently taken a new job and introducing a number of new tools to my new coworker I thought I’d share how I setup my jump box to and keep it updated so others can benefit, and I can find it later (I did put this in our internal Confluence pages, but I do have a box in Azure for presentations). In alphabetical order because that’s the only way to make sense of things.  I’d be curious if anybody has anything they use that I should add, so please leave comments.

Read on to see the list, including all of the Azure Data Studio and Visual Studio Code extensions Tracy likes to use.

Comments closed