Press "Enter" to skip to content

Author: Kevin Feasel

Generative Additive Models for Customer Lifetime Value Estimation

Nicholas Clark builds a GAM:

I typically work in quantitative ecology and molecular epidemiology, where we use statistical models to predict species distributions or disease transmission patterns. Recently though, I had an interesting conversation with a data science PhD student who mentioned they were applying GAMs to predict Customer Lifetime Value at a SaaS startup. This caught my attention because CLV prediction, as it turns out, faces remarkably similar statistical challenges to ecological forecasting: nonlinear relationships that saturate at biological or business limits, hierarchical structures where groups behave differently, and the need to balance model flexibility with interpretability for stakeholders who need to understand why the model makes certain predictions.

This is an interesting article and I had not thought of using a GAM for calculating Customer Lifetime Value. I used a much simpler technique the one time I calculated CLV in earnest. H/T R-Bloggers.

Leave a Comment

Tuning Window Functions in SQL Server

I have a new video:

In this video, I show you various techniques you can use to make window functions faster, including proper indexing and usage of batch mode. I also demonstrate the performance difference between RANGE and ROWS.

This wraps up my series on window functions, and although I pack a lot of content into the video, I highly recommend checking out the links for deeper dives into performance.

Leave a Comment

Generating Gaussian Numbers in SQL Server

Sebastiao Pereira generates random numbers:

Generating random numbers from a normal distribution is essential for accuracy and realistic modeling, simulation, inference, and algorithm design. This is useful for scientific, engineering, statistical, and AI domains. Let’s see how we can create Box Muller transform functionality in SQL Server without the use of external tools.

I’ve used this technique before in SQL Server and it works really well. Performance is fine and the results approximate a normal distribution like you’d want.

Leave a Comment

Business Continuity in PostgreSQL

Warda Bibi lays out the basics of high availability and disaster recovery in PostgreSQL:

System outages, hardware failures, or accidental data loss can strike without warning. What determines whether operations resume smoothly or grind to a halt is the strength of the disaster recovery setup. PostgreSQL is built with powerful features that make reliable recovery possible.

This post takes a closer look at how these components work together behind the scenes to protect data integrity, enable consistent restores, and ensure your database can recover from any failure scenario.

This is mostly an architecture-level view, but then again, a lot of HA/DR is about making good architectural decisions.

Leave a Comment

Corporate Confusion and SQL Server Backups

Mike Walsh spins a thread:

With the recent data center fire and missing government data in South Korea, it felt like a good time to continue with the SQL Server Regrets series of blog posts. We’ll talk about the confusion of thinking you are covered for recoverability – when you just aren’t. This “Regret” comes up far too often when a new client comes in with ongoing backup issues.

Mike gives some real-world and some hypothetical examples around why different teams in a company might be confused around what they’re getting from their SQL Server backups and Mike promotes Jeff Iannucci’s sp_CheckBackup stored procedure as a starting point.

Leave a Comment

Automating Semantic Model Security via Semantic Link

Marc Lelijveld writes a script:

You may be using standardized solutions like Fabric Unified Admin Monitoring (FUAM) or any other templated solution that comes with a semantic model. As part of transparency within your organization, you decided to share the insights gathered with others in the organization by adjusting the solution to apply your own security setup on top.

However, after running an update of the template, you’ve overwritten your custom security configuration and reapplying costs a lot of time, again and again after each update. Why don’t we just script this security? In this blog I will share how you can deploy security configurations to semantic models and assign users to these roles.

Click through for an example script and details on how it works.

Leave a Comment

OneLake Diagnostics now GA

Tom Peplow makes an announcement:

Alongside Workspace monitoring and user activity tracking accessible through Microsoft Purview, these capabilities make federated data governance a reality at enterprise scale.

Enable diagnostics at the workspace level, and OneLake streams diagnostic events as JSON into a Lakehouse you choose—within the same capacity. You can use these events to unlock usage insights, provide operational visibility, and support compliance reporting.

It does seem a bit odd that this data goes into a Lakehouse rather than into an Eventhouse. But click through to see how things work, what sorts of events this captures, and what you can do with it.

Leave a Comment

Inferential Statistics in Excel using R

Adam Gladstone does a bit of inference testing:

In the previous posts in this series (Using R in Excel) I have demonstrated some basic use-cases where using R in Excel is useful. Specifically we have looked at descriptive statisticslinear regressionforecasting, and calling Python. In this post, I am going to look at inferential statistics and how R can be used (in Excel) to perform some typical statistical tests. Excel provides many excellent facilities for data wrangling and analysis. However, for certain types of statistical data analysis the limitations of the built-in functions and the Analysis ToolPak is not sufficient, and R provides superior facilities.

Read on for a few examples of tests, though there are a huge number available in R itself as well as its ecosystem of packages.

Leave a Comment

Cross-Cloud Data Replication with Confluent

Ahmed Saef Zamzam and Hannah Miao move some data:

Cross-cloud replication over private networks is powered by Cluster Linking, Confluent’s fully managed, offset-preserving replication service that mirrors topics across clusters. Cluster Linking already makes it simple to connect environments across regions, clouds, and hybrid deployments with near-zero data loss. Now, with private cross-cloud replication, the possibilities expand even further—enabling secure multicloud data sharingdisaster recovery, and compliance use cases that many organizations, particularly those in regulated industries, have struggled to solve for years.

Click through to see how it works and how it can beat mechanisms that existed prior to it.

Leave a Comment

Table Switching in SQL Server

Rebecca Lewis swaps in a bag of sand while snatching the golden idol:

Use ALTER TABLE … SWITCH to move very large tables instantly.  Yes.  I said instantly.  ALTER TABLE .. SWITCH doesn’t copy the data or physically move it.  It just reassigns the page ownership.  This means that only the metadata with the data pointer changes, and that’s why it completes in milliseconds and barely touches the transaction log.

There are some strict conditions around when you can use this technique, but it’s as powerful as Rebecca mentions.

Leave a Comment