Press "Enter" to skip to content

Curated SQL Posts

Building a Pareto Chart in Power BI

Boniface Muchendu creates a Pareto chart:

Creating a Pareto chart in Power BI is a powerful way to visualize the 80/20 rule in action. This type of chart helps you quickly identify the top contributors to your business metrics—whether you’re analyzing sales, categories, or customer segments. In this guide, you’ll learn how to build a dynamic Pareto chart using DAX, customize it, and apply it across different data dimensions.

Read on for the instructions.

Leave a Comment

Selective Caching in SSIS

Andy Brownsword takes us through a pattern:

We’ve recently looked at how caching can improve performance and I wanted to show how we can eek even more performance out of caches by using a custom approach I’ll term Selective Caching.

I’ll note here that there’s a potential gotcha with this approach which we’ll get to before the end of the post!

Click through for a description of the pattern and when it starts to break down.

Leave a Comment

Spatial Cross-Validation in R

Jakub Nowosad wraps up a series:

This document provides an overview of two R packages, sperrorest and blockCV, that can be used for spatial cross validation, but are outside of standard machine learning frameworks like carettidymodels, or mlr3.

All of the examples below use the same dataset, which includes the temperature measurements in Spain, a set of covariates, and the spatial coordinates of the temperature measurements.

Click through for a pair of cross-validation packages, as well as a link to the rest of the series. H/T R-Bloggers.

Leave a Comment

Bad Request Error Running Powershell in Azure DevOps

Koen Verbeeck wants good requests:

I needed to run a PowerShell cmdlet in an Azure Devops pipeline. The cmdlet in question was New-AzRoleAssignment, but the cmdlet itself isn’t important. What is important is that I needed to pass the object ID of a service principal to the command. Even though I was pretty sure the syntax and everything was correct, I got a “Operation returned an invalid status code ‘BadRequest’” error when the PowerShell was run (inside an Azure PowerShell task):

Read on to see how Koen diagnosed and resolved the issue.

Leave a Comment

Random Page Cost and PostgreSQL Query Plans

Tomas Vondra takes us through a setting:

Last week I posted about how we often don’t pick the optimal plan. I got asked about difficulties when trying to reproduce my results, so I’ll address that first (I forgot to mention a couple details). I also got questions about how to best spot this issue, and ways to mitigate this. I’ll discuss that too, although I don’t have any great solutions, but I’ll briefly discuss a couple possible planner/executor improvements that might allow handling this better.

Tomas’s points around the random_page_cost setting sound a lot like the cost threshold for parallelism setting in SQL Server in inverse: a setting whose default makes sense in a world of spinning disks at 7200 RPM, but not really in a solid state world.

Leave a Comment

The Importance of Power BI Performance Load Testing

Gilbert Quevauvilliers runs some tests:

It is becoming increasingly important to understand how the Power BI reports/Semantic Model that are being used in your organization are performing.

When using Fabric Capacities this can potentially be of critical importance, because a single report that is not well designed could cripple or bring down your capacity.

By completing Power BI Performance load testing before it goes into a production environment allows for scalable, dependable, repeatable testing to take place in lower environments.

Read on to see what this entails and the tool Gilbert will use throughout this series.

Leave a Comment

Writing Back to a Fabric Data Warehouse via UDF

Jon Vöge continues a series on write-back options into Microsoft Fabric:

In that article, we took advantage of some of the built-in sample code from the User Data Function editor, as well as some great code examples from Sujata: Example User data functions for Translytical task flows · GitHub

The problem? All of these samples use SQL Databases in Fabric as the backend item.

Jon switches this from a SQL database into a Fabric Data Warehouse, and notes some of the challenges along the way.

Leave a Comment

k Nearest Neighbors Search in Elasticsearch

Govind Singh Rawat looks for nearby documents:

Businesses are increasingly relying on intelligent search capabilities to enhance customer experience, automate insights, and unlock the potential of unstructured information. Elasticsearch, a leading distributed search and analytics engine, is at the heart of many such systems. One of its powerful and lesser-known capabilities is support for k-nearest neighbors (k-NN) search, a method particularly useful for similarity-based retrieval in domains such as semantic search, recommendation engines, and image recognition.

This article delves into what Elasticsearch and k-NN search are, how the two are integrated, and how to configure and optimize k-NN in Elasticsearch for real-world applications.

Click through for a high-level primer on the topic, as well as a few links to additional resources.

Leave a Comment

Event Streaming in Microsoft Fabric

Rayis Imayev streams some data:

In my post last week (https://datanrg.blogspot.com/2025/06/salesforce-cdc-data-integration.html), I talked about Salesforce Change Data Capture (CDC) event data streaming, where the initial event destination was file storage in Azure. But what if we anticipate a higher volume of incoming Salesforce source data or the addition of a new data feed? This could create the need for an alternative method of managing incoming events.

Read on to learn more.

Leave a Comment