Press "Enter" to skip to content

Day: October 15, 2025

Generative Additive Models for Customer Lifetime Value Estimation

Nicholas Clark builds a GAM:

I typically work in quantitative ecology and molecular epidemiology, where we use statistical models to predict species distributions or disease transmission patterns. Recently though, I had an interesting conversation with a data science PhD student who mentioned they were applying GAMs to predict Customer Lifetime Value at a SaaS startup. This caught my attention because CLV prediction, as it turns out, faces remarkably similar statistical challenges to ecological forecasting: nonlinear relationships that saturate at biological or business limits, hierarchical structures where groups behave differently, and the need to balance model flexibility with interpretability for stakeholders who need to understand why the model makes certain predictions.

This is an interesting article and I had not thought of using a GAM for calculating Customer Lifetime Value. I used a much simpler technique the one time I calculated CLV in earnest. H/T R-Bloggers.

Leave a Comment

Generating Gaussian Numbers in SQL Server

Sebastiao Pereira generates random numbers:

Generating random numbers from a normal distribution is essential for accuracy and realistic modeling, simulation, inference, and algorithm design. This is useful for scientific, engineering, statistical, and AI domains. Let’s see how we can create Box Muller transform functionality in SQL Server without the use of external tools.

I’ve used this technique before in SQL Server and it works really well. Performance is fine and the results approximate a normal distribution like you’d want.

Leave a Comment

Tuning Window Functions in SQL Server

I have a new video:

In this video, I show you various techniques you can use to make window functions faster, including proper indexing and usage of batch mode. I also demonstrate the performance difference between RANGE and ROWS.

This wraps up my series on window functions, and although I pack a lot of content into the video, I highly recommend checking out the links for deeper dives into performance.

Leave a Comment

Business Continuity in PostgreSQL

Warda Bibi lays out the basics of high availability and disaster recovery in PostgreSQL:

System outages, hardware failures, or accidental data loss can strike without warning. What determines whether operations resume smoothly or grind to a halt is the strength of the disaster recovery setup. PostgreSQL is built with powerful features that make reliable recovery possible.

This post takes a closer look at how these components work together behind the scenes to protect data integrity, enable consistent restores, and ensure your database can recover from any failure scenario.

This is mostly an architecture-level view, but then again, a lot of HA/DR is about making good architectural decisions.

Leave a Comment

Corporate Confusion and SQL Server Backups

Mike Walsh spins a thread:

With the recent data center fire and missing government data in South Korea, it felt like a good time to continue with the SQL Server Regrets series of blog posts. We’ll talk about the confusion of thinking you are covered for recoverability – when you just aren’t. This “Regret” comes up far too often when a new client comes in with ongoing backup issues.

Mike gives some real-world and some hypothetical examples around why different teams in a company might be confused around what they’re getting from their SQL Server backups and Mike promotes Jeff Iannucci’s sp_CheckBackup stored procedure as a starting point.

Leave a Comment

OneLake Diagnostics now GA

Tom Peplow makes an announcement:

Alongside Workspace monitoring and user activity tracking accessible through Microsoft Purview, these capabilities make federated data governance a reality at enterprise scale.

Enable diagnostics at the workspace level, and OneLake streams diagnostic events as JSON into a Lakehouse you choose—within the same capacity. You can use these events to unlock usage insights, provide operational visibility, and support compliance reporting.

It does seem a bit odd that this data goes into a Lakehouse rather than into an Eventhouse. But click through to see how things work, what sorts of events this captures, and what you can do with it.

Leave a Comment

Automating Semantic Model Security via Semantic Link

Marc Lelijveld writes a script:

You may be using standardized solutions like Fabric Unified Admin Monitoring (FUAM) or any other templated solution that comes with a semantic model. As part of transparency within your organization, you decided to share the insights gathered with others in the organization by adjusting the solution to apply your own security setup on top.

However, after running an update of the template, you’ve overwritten your custom security configuration and reapplying costs a lot of time, again and again after each update. Why don’t we just script this security? In this blog I will share how you can deploy security configurations to semantic models and assign users to these roles.

Click through for an example script and details on how it works.

Leave a Comment