Press "Enter" to skip to content

Month: July 2022

Join Removal Due to Foreign Key Constraint

David Alcock shows a performance benefit from having a foreign key constraint in place:

Foreign keys are used in database design to enforce referential integrity but they also have some performance benefits as well that you might not necessarily notice unless you’re looking into your execution plans.

Let’s take the following query using the AdventureWorks2019 sample database where I’m selecting the BusinessEntityID and JobTitle from the HumanResources.Employee table and by using an inner join I’m only returning rows that have matching values (BusinessEntityID) in both tables:

There are specific rules to table elimination but if you meet the criteria, it can save you a bit of CPU time and I/O.

Comments closed

Gateways and the CPU Cost of Power BI Dataset Refresh

Chris Webb continues experimenting:

After last week’s post on measuring Power Query CPU usage during dataset refresh, someone asked an obvious question that I should have addressed: does using a gateway change anything? After all, if you’re using a gateway to connect to an on-premises data source then all the Power Query queries transforming the data from that source will be executed on the gateway machine and not in the Power BI Service.

Let’s do a quick test to find out. 

Read on to see what Chris found out.

Comments closed

Working with Objects in Powershell

Jeffrey Hicks explains the value of working with objects in Powershell:

I expect I will write several articles about PowerShell and its relationship with objects. I know that this is the biggest hurdle for PowerShell beginners to overcome. But once they grasp that PowerShell is about working with objects in the pipeline, they recognize the value and begin finding it easier to write PowerShell code and use it interactively at a console prompt.

Read the whole thing and if you like what you see, there’s also a Substack where you can sign up for free or subscribe for additional content.

Comments closed

Searching Industry Templates for Lake Databases in Synapse

Lakshmi Murthy is just browsing:

With Azure Synapse Database Templates generally available, our customers are constantly wanting to see and learn more about how to use these templates. Through these blogs we want to share tips and tricks our customers can use to help them utilize these templates in an efficient way. We’ve recently received several questions around the different ways a user can navigate these templates to create their lake databases. In this blog, I’d like to walk through a few options that may come handy as you give database templates a try.

Azure Synapse Analytics offers a no-code database designer which allows you to browse these database templates, select and customize the tables you want to use, to model your enterprise data. There are several ways to browse the tables provided by the comprehensive industry templates within the designer’s exploration experience. Though the user experience is super intuitive, there are a few tips and tricks that can make this process even easier. Let’s do a quick tour to learn about the different ways to browse these templates.

Click through for a few different ways to look at standard tables for different industries.

Comments closed

Uploading Multiple Reports to Power BI

Jon Fletcher doesn’t have time to upload reports one by one with the UI:

In this blog post, I will be sharing a PowerShell script that allows multiple Power BI reports to be uploaded at once. In a previous blogpost, I shared a PowerShell script that allowed users to downloaded multiple Power BI reports. Combined you could move several reports from one workspace to another in a few seconds.

The script is downloadable at the bottom of the page as a txt file. To use the script there are three steps to take.

Click through to see how it all works.

Comments closed

Filtered Statistics and Table Performance

Guy Glantser provides a use case for filtered statistics:

Let’s say you have a very large table on a SQL Server 2012 Standard Edition instance. This means: old cardinality estimator and no partitioning. The table has a DATETIME column, which is ever-increasing, and it contains 5 years of data (it has to, due to regulations). The auto-update statistics kicks in only every 4 days, more or less, even when trace flag 2371 is enabled. The problem is that users usually query the table for the last day, and only rarely need to access older data. Since auto-update statistics uses a very small sample rate for very large tables, the result is not very accurate. The bottom line of all this is that most of the time you get a poor execution plan, because the optimizer estimates very few rows, while in fact there are many rows. What can you do?

I’m not sure I’ve ever used filtered statistics but it is good to know such a thing exists.

Comments closed

Parameter Sensitive Plan Optimization with Branches and Local Variables

Erik Darling has some mixed news. First up, if you branch a lot:

I’ve spent a bit of time talking about how IF branches can break query performance really badly in SQL Server.

While the Parameter Sensitive Plan (PSP) optimization won’t fix every problem with this lazy coding habit, it can fix some of them in very specific circumstances, assuming:

– The parameter is eligible for PSP

– The parameter is present across IF branches

Less sanguine news if you use local variables a lot:

One fix I’ve been wishing for, or wish I’ve been fixing for, is a cure for local variables. I’d even be cool if Forced Parameterization was that cure, but you know…

Time will tell.

Though I prefer to call local variables an “Optimize for mediocre” plan hint.

Comments closed

The Seedy Underbelly of Machine Learning Fitting

John Mount is not impressed with a fair amount of machine learning:

For this to actually happen we need the actual system to be in our concept space, a lot of training data, and an abundance of caution.

In practice what we see more and more is the training procedure in fact attacks the evaluation procedure. It doesn’t just improve the quality of the fit artifact, but through mere optimization accidentally exploits weaknesses in the measurement system itself. When this happens, fitting does the following.

In ML training, we often accidentally “teach to the test” by comparing models via test data, which over time selects for models which are better fits for the test data. As John notes, this can come two separate ways and if you don’t define your optimization strategy correctly, you can accidentally train models which optimize on non-realistic things. A classic example is the neural network which could pick out malignant tumors from non-malignant tumors not because of any property of the tumor itself but rather because the malignant tumor images all had rulers in them and the non-malignant images did not. Read the whole thing for a second pitfall you can hit when training models.

Comments closed

Recreating a Shiny App with Plumber and ReactJS

Liam Kalita starts a new series:

Being able to host static content on RStudio Connect means we can host ReactJS applications on the platform. React is a great framework for developing web applications, with a lot of power and flexibility when creating user interfaces. Separating {shiny} applications into a user interface and a data processing API has its advantages.

In this blog series, we will guide you through creating the application from the RStudio tutorial for creating a {shiny} app, except we’ll be attempting it using ReactJS and an R {plumber} API instead of {shiny}. In this blog, part 1, we will be introducing you to the technologies we will need for the tutorial.

Read on for the essentials of what plumber and ReactJS are and why you might use each of them.

Comments closed

Building Custom Widgets for Azure Data Studio

Esat Erkec builds a widget:

One of the most advantageous features of ADS is that it allows the creation of customized widgets. With the help of the widgets, we can easily visualize the result of the queries using different graph types. In this context, building the performance monitoring widgets can be a reasonable approach so that we can track the performance metrics readily. Now, let’s learn how to build a custom widget with a very straightforward example.

I haven’t tried this before in Azure Data Studio but I can see the benefit, especially if you have a common set of queries you intend to run to observe the status of a given server.

Comments closed