Press "Enter" to skip to content

Curated SQL Posts

The Through-the-Door Problem in Credit Risk Modeling

Richard Vale takes us through a data challenge:

In credit risk modelling, you want to calculate the probability that a loan will default. Since different financial institutions gather different data and offer different products, there is no one-size-fits-all approach to doing this. Therefore, credit risk models are usually built using the institution’s own data. For example, if I’m building a credit risk model for XYZ Bank, I look at loans which XYZ bank has previously granted, and try to estimate the probability that a future loan will default based on principal, tenor, the borrower’s credit rating, and so on.

For those who haven’t heard of the through-the-door problem before, this is a good moment to pause and think about what is wrong with this. Why does this process contain a huge pitfall?

Click through for the answer, as well as an example of the problem and one way to get around this. H/T R-Bloggers.

Leave a Comment

Troubleshooting Permissions in SQL Server

Jon Russell takes a look at several ways to gauge user permissions:

Understanding and troubleshooting SQL Server permissions can be challenging, especially when direct grants, role inheritance, ownership chains, and explicit denies all interact. The six scenarios that follow show how the engine decides who can do what, then demonstrate the diagnostic steps that reveal why it made that decision. Each section provides a setup script you can run in a dedicated test database, followed by a diagnostic query and a short explanation of the result.

Click through for six methods. I do wish that, instead of sys.fn_my_permissions() there was some sys.fn_user_permissions(@UserName) option. I realize that you can execute as a specific user and then run the function, but I had dreamed for years about having a way to track effective user permission changes, and sys.fn_my_permissions() requires not only that you have the authority to execute as a specific user, but also that you know all of the relevant users.

Leave a Comment

Goodbye Default Contributor Role in Fabric Workspace Identities

Varun Jain makes a security announcement:

Fabric workspace identity is an automatically managed service principal that can be associated with a Fabric workspace. Fabric workspaces with a workspace identity can securely read or write to firewall-enabled Azure Data Lake Storage Gen2 accounts through trusted workspace access for OneLake shortcuts. Fabric items can use the identity when connecting to resources that support Microsoft Entra authentication. Fabric uses workspace identities to obtain Microsoft Entra tokens without the customer having to manage any credentials. 

Previously, a workspace identity was automatically assigned the workspace contributor role and had access to workspace items.  

Read on to see what’s changing, why, and what you can do instead.

Leave a Comment

Building a Pareto Chart in Power BI

Boniface Muchendu creates a Pareto chart:

Creating a Pareto chart in Power BI is a powerful way to visualize the 80/20 rule in action. This type of chart helps you quickly identify the top contributors to your business metrics—whether you’re analyzing sales, categories, or customer segments. In this guide, you’ll learn how to build a dynamic Pareto chart using DAX, customize it, and apply it across different data dimensions.

Read on for the instructions.

Leave a Comment

Selective Caching in SSIS

Andy Brownsword takes us through a pattern:

We’ve recently looked at how caching can improve performance and I wanted to show how we can eek even more performance out of caches by using a custom approach I’ll term Selective Caching.

I’ll note here that there’s a potential gotcha with this approach which we’ll get to before the end of the post!

Click through for a description of the pattern and when it starts to break down.

Leave a Comment

Spatial Cross-Validation in R

Jakub Nowosad wraps up a series:

This document provides an overview of two R packages, sperrorest and blockCV, that can be used for spatial cross validation, but are outside of standard machine learning frameworks like carettidymodels, or mlr3.

All of the examples below use the same dataset, which includes the temperature measurements in Spain, a set of covariates, and the spatial coordinates of the temperature measurements.

Click through for a pair of cross-validation packages, as well as a link to the rest of the series. H/T R-Bloggers.

Leave a Comment

Bad Request Error Running Powershell in Azure DevOps

Koen Verbeeck wants good requests:

I needed to run a PowerShell cmdlet in an Azure Devops pipeline. The cmdlet in question was New-AzRoleAssignment, but the cmdlet itself isn’t important. What is important is that I needed to pass the object ID of a service principal to the command. Even though I was pretty sure the syntax and everything was correct, I got a “Operation returned an invalid status code ‘BadRequest’” error when the PowerShell was run (inside an Azure PowerShell task):

Read on to see how Koen diagnosed and resolved the issue.

Leave a Comment

Random Page Cost and PostgreSQL Query Plans

Tomas Vondra takes us through a setting:

Last week I posted about how we often don’t pick the optimal plan. I got asked about difficulties when trying to reproduce my results, so I’ll address that first (I forgot to mention a couple details). I also got questions about how to best spot this issue, and ways to mitigate this. I’ll discuss that too, although I don’t have any great solutions, but I’ll briefly discuss a couple possible planner/executor improvements that might allow handling this better.

Tomas’s points around the random_page_cost setting sound a lot like the cost threshold for parallelism setting in SQL Server in inverse: a setting whose default makes sense in a world of spinning disks at 7200 RPM, but not really in a solid state world.

Leave a Comment

The Importance of Power BI Performance Load Testing

Gilbert Quevauvilliers runs some tests:

It is becoming increasingly important to understand how the Power BI reports/Semantic Model that are being used in your organization are performing.

When using Fabric Capacities this can potentially be of critical importance, because a single report that is not well designed could cripple or bring down your capacity.

By completing Power BI Performance load testing before it goes into a production environment allows for scalable, dependable, repeatable testing to take place in lower environments.

Read on to see what this entails and the tool Gilbert will use throughout this series.

Leave a Comment