Press "Enter" to skip to content

Day: August 13, 2025

Portfolio Theory and Risk Reduction

John Mount continues a series on risk optimization:

I want to discuss how fragile optimization solutions to real world problems can be. And how to solve that.

Small changes in modeling strategy, assumptions, data, estimates, constraints, or objective can lead to unstable and degenerate solutions. To warm up let’s discuss one of the most famous optimization examples: Stigler’s minimal subsistence diet problem.

There are some neat stories in the post as you walk through problems of linear programming.

Also, Nina Zumel has a post on overestimation bias:

Revenue optimization projects can be particularly valuable and exciting. They involve:

  • Estimating demand as a function of offered features, price, and match to market.
  • Picking a set of offerings and prices optimizing the above inferred demand.

The great opportunity of these projects is that one can derive value from improving the inference of the demand estimate function, improving the optimization, and even improving the synergy between these two steps.

However, there is a common situation that can lose client trust and sink revenue optimization projects.

Read on for that article.

Leave a Comment

Feature Importance in XGBoost

Ivan Palomares Carrascosa takes a look at one of my favorite plots in XGBoost:

One of the most widespread machine learning techniques is XGBoost (Extreme Gradient Boosting). An XGBoost model — or an ensemble that combines multiple models into a single predictive task, to be more precise — builds several decision trees and sequentially combines them, so that the overall prediction is progressively improved by correcting the errors made by previous trees in the pipeline.

Just like standalone decision trees, XGBoost can accommodate both regression and classification tasks. While the combination of many trees into a single composite model may obscure its interpretability at first, there are still mechanisms to help you interpret an XGBoost model. In other words, you can understand why predictions are made and how input features contributed to them.

This article takes a practical dive into XGBoost model interpretability, with a particular focus on feature importance.

Read on to learn more about how feature importance works, as well as the three different views of the data you can get.

Leave a Comment

Equalizing Proxy vs Redirect Rates for OneLake Access

Elizabeth Oldag announces a pricing change:

We’re thrilled to share a major update and simplification to OneLake’s capacity utilization model that will make it even easier to manage Fabric capacity and scale your data workloads. We are reducing the consumption rate of OneLake transactions via proxy to match the rate for transactions via redirect. This means you no longer have to worry where you are accessing your OneLake data from (via proxy or redirect), they will consume your capacity at the same low rate.

Read on to see what this means in practice.

Leave a Comment

Registering Applications to Read Fabric Resources

Andy Leonard works in Microsoft Entra:

My older son, Stephen, and I have been vibe coding information-dense solutions for Fabric lately. The latest application is Fabric Navigator, which simplifies navigation between Fabric Data Factory pipelines and notebooks. While Fabric Navigator includes links to instructions about configuring Azure and Fabric security to allow read access to Fabric Data Factory pipelines and notebooks, I feel a walk-through of a minimally-viable security configuration is in order. Hence, this post.

Click through to see what setting you need to make in Entra, as well as settings you need to change in Microsoft Fabric, for this to work.

Leave a Comment

Installing SQL Server Instances via dbatools

David Seis digs into another dbatools cmdlet:

As DBAs we install SQL Server for various reasons regularly. If you could save time for each installation for more critical tasks, would you?

In this blog post, we will audit the dbatools command Install-Dbainstance. I will test, review, and evaluate the script based on a series of identical steps. Our goal is to provide insights, warnings, and recommendations to help you use this script effectively and safely. Install-Dbainstance is a powerful tool to automate the install and configuration of a new SQL Server instance. It works well in scenarios that require frequent deployments of SQL Server instances.

You can definitely automate installation of SQL Server without the cmdlet, but the dbatools team does a good job of laying out what’s possible that you might not necessarily get just from the config script that the SQL Server installer spits out (and uses when you next-next-next your way to success).

Leave a Comment

Recommendations around SUMMARIZECOLUMNS

Marco Russo and Alberto Ferrari share some thoughts:

SUMMARIZECOLUMNS is the most widely-used function in Power BI queries. An important and unique feature of SUMMARIZECOLUMNS is that it determines automatically how to scan the model to produce its result. Indeed, when using SUMMARIZEGROUPBYADDCOLUMNS, or any of the more basic querying functions, developers must declare the source table to perform the grouping, as well as the group-by columns and the measures to add to the result. On the other hand, SUMMARIZECOLUMNS requires only the group-by columns; there is no need to provide the source table, which is the primary ingredient of any query. SUMMARIZECOLUMNS figures out the structure of the result by itself, using a sophisticated algorithm that requires some understanding.

The pair do have a whitepaper available on their premium (paid) service but even the free post contains a lot of detail you’ll want to check out if you use DAX.

Leave a Comment

Using JSON Arrays instead of JSON Objects for Serialization

Lukas Eder makes a recommendation:

Why, yes of course! jOOQ is in full control of your SQL statement and knows exactly what column (and data type) is at which position, because you helped jOOQ construct not only the query object model, but also the result structure. So, a much faster index access is possible, compared to the much slower column name access.

The same is true for ordinary result sets, by the way, where jOOQ always calls JDBC’s ResultSet.getString(int), for example, over ResultSet.getString(String). Not only is it faster, but also more reliable. Think about duplicate column names, e.g. when joining two tables that both contain an ID column. While JSON is not opinionated about duplicate object keys, not all JSON parsers support this, let alone Java Map types.

Read on for some insight into when you might want to choose either of the two approaches, and why Lukas went with JSON arrays instead of JSON objects for object serialization in jOOQ.

Leave a Comment