Press "Enter" to skip to content

Author: Kevin Feasel

Bluebox: An Evolving Sample Database for PostgreSQL

Ryan Booz has a sample database:

Sure, there are datasets everywhere. Kaggle currently lists over 600,000 public datasets, but most of them are static CSV files that you load once and never touch again. Great for a one-time analysis, not so great for learning how a real database behaves over time. The Postgres Wiki lists a few dozen sample databases, too. And shoot, your shiny new AI coding buddy can help you create one if you want to put the time in.

The problem with most of these datasets is that they’re primarily static. If you’re lucky, some of the datasets might produce new data dumps once a month to keep things “current”. But the problem is that you can’t really practice query tuning if your data never changes. You can’t explore vacuum behavior when there are no updates. You can’t test monitoring tools when nothing is happening.

Click through for more information on Bluebox, as well as a Docker container containing several helpful tools and processes to make this data evolve over time.

Leave a Comment

Two Options for Content Layout in Power BI

Valerie Junk covers a pair of options:

In this tutorial, I want to show a small but very practical formatting setting in Power BI.

When we create a table or matrix visual, we sometimes end up with white space on the right side. For example, if you show data by month and you only have 6 months of data so far, but you design the visual to fit 12 months, the table/matrix is already sized for the full year, which leads to a lot of empty space.
In Power BI we have two column header formatting options:

Click through for the two options, where you can find the option, and some important information around both options.

Leave a Comment

Transaction ID Locking

Hugo Kornelis disentangles two new features in SQL Server 2025:

One of these two features is Transaction ID (TID) Locking. Slated to end the memory waste of thousands of individual row locks, and the concurrency killer of lock escalation. What it is, how does it work, what are the limitations, and do we really get a free lunch?

Click through for the video, though I am firmly wedded to the idea that TANSTAAFL. I say this without spoiling any part of the video.

Leave a Comment

Choosing between PCA and t-SNE

Shittu Olumide visualizes some data:

For data scientists, working with high-dimensional data is part of daily life. From customer features in analytics to pixel values in images and word vectors in NLP, datasets often contain hundreds and thousands of variables. Visualizing such complex data is difficult.

That’s where dimensionality reduction techniques come in. Two of the most widely used methods are Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE). While both reduce dimensions, they serve very different goals.

The thing that ultimately soured me on t-SNE is the stochastic nature. You can run the same set of operations multiple times and get significantly different results. It’s really easy to use and the output graphs are really pretty, but if you can’t trust the outputs to be at least somewhat stable, there’s a hard limit to its value.

Leave a Comment

The Downsides of Python

Andy Brown writes a companion piece:

Four years ago I wrote a blog on this site explaining why Python is better than C# and, arguably, most other programming languages. To redress the balance, here are 10 reasons why you might want to avoid getting caught up in Python’s oh-so-tempting coils – particularly when building large, long-lived systems.

If this sounds like an attempt to have my cake and eat it, my defense is that I follow in my work what I preach here: I use Python for ad-hoc jobs, at which it is unsurpassed. For larger systems – such as our MV website – I use C#, due to its strengths in maintainability, tooling as well as the practical consideration that my personal preference for Visual Basic is not shared by the wider team.

Some of it is opinion, some of it is annoying. I’ve grown to appreciate the spacing, though it can be really painful when copying code from somewhere and the spacing gets all messed up. My short version of Python is that it requires you to have more discipline as a developer to prevent messes from occurring, and I think that’s a negative on net. But that same aspect simultaneously makes it so much easier to prototype and rapidly solve problems, so there’s a natural trade-off here.

Leave a Comment

Adaptive Time Series Visualization in Microsoft Fabric

Devang Shah and Slava Trofimov show off a design pattern:

This design pattern provides intuitive, interactive Fabric-native experiences for any user:

  • Intelligent time binning: Handle billions of data points by automatically grouping them into optimal intervals.
  • Time brushing: Zoom in any period with drag-and-select interactions.
  • Multi-metric comparison: View multiple time series side by side across different assets.
  • Flexible aggregation: Switch between average, min, max, and sum with a single selection.
  • Anomaly detection: KQL queries detect unusual patterns in your time series with no ML expertise required.
  • Statistical insights: View descriptive statistics and correlations.
  • Contextualization: Bring asset hierarchies, tag metadata, and definitions directly into the report for richer interpretation.

Read on to learn more about the pattern and how it works. There are a lot of moving parts to get right, but the end result looks impressive.

Leave a Comment

Tips on a Post-NTLM Future

Rebecca Lewis wants to save you later heartache:

‘Informational message’ and ‘Further action is only required if Kerberos authentication is required’… For years, we could ignore this, but not anymore. Microsoft published a three-phase roadmap on January 28, 2026 to disable NTLM by default in upcoming Windows releases — and phase one is already live. That ‘informational message’ is about to become an action-item.

In most environments I audit, somewhere between 40 and 60 percent of Windows Authentication connections to SQL Server are running on NTLM — and nobody knew until we looked. If you have not checked yours, now is the time.

Read on to learn how you can check this, the kinds of problems that can prevent Kerberos authentication, and how you can fix them.

Leave a Comment

Using a Microsoft Fabric Variable Library in a Dataflow

Laura Graham-Brown shows another way to use variable libraries:

One of the popular low-code tools within Microsoft Fabric is the Gen2 Dataflow. Power BI report builders already know some Power Query. So armed with this knowledge is a popular starting point to load data into Microsoft Fabric. Adding values from the Variable Library in a Dataflow is an obvious plan to make it more future proof and to work better with Deployment pipelines.

I will confess the first time I tried these I could not get them to work till I read the instructions correctly. So they do work just understand the limitations!

To be fair, following instructions is one of the most challenging things to do, it seems.

Leave a Comment

SQL Server Performance Monitor

Erik Darling announces a new open-source product:

That is, if you survive the installation process. Active Directory. Domain accounts. Firewalls. Security policies. Additional database servers. Annoying.

Or just send everything to the cloud, somewhere.

I got real tired of dealing with that, so I built a free, open-source SQL Server performance monitoring tool that isn’t any of those things, and doesn’t suck.

This does sound pretty neat, and the Lite version in particular is very intriguing in places where you can’t (or don’t want to) touch the servers.

3 Comments