Press "Enter" to skip to content

Day: February 13, 2026

Choosing between PCA and t-SNE

Shittu Olumide visualizes some data:

For data scientists, working with high-dimensional data is part of daily life. From customer features in analytics to pixel values in images and word vectors in NLP, datasets often contain hundreds and thousands of variables. Visualizing such complex data is difficult.

That’s where dimensionality reduction techniques come in. Two of the most widely used methods are Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE). While both reduce dimensions, they serve very different goals.

The thing that ultimately soured me on t-SNE is the stochastic nature. You can run the same set of operations multiple times and get significantly different results. It’s really easy to use and the output graphs are really pretty, but if you can’t trust the outputs to be at least somewhat stable, there’s a hard limit to its value.

Leave a Comment

The Downsides of Python

Andy Brown writes a companion piece:

Four years ago I wrote a blog on this site explaining why Python is better than C# and, arguably, most other programming languages. To redress the balance, here are 10 reasons why you might want to avoid getting caught up in Python’s oh-so-tempting coils – particularly when building large, long-lived systems.

If this sounds like an attempt to have my cake and eat it, my defense is that I follow in my work what I preach here: I use Python for ad-hoc jobs, at which it is unsurpassed. For larger systems – such as our MV website – I use C#, due to its strengths in maintainability, tooling as well as the practical consideration that my personal preference for Visual Basic is not shared by the wider team.

Some of it is opinion, some of it is annoying. I’ve grown to appreciate the spacing, though it can be really painful when copying code from somewhere and the spacing gets all messed up. My short version of Python is that it requires you to have more discipline as a developer to prevent messes from occurring, and I think that’s a negative on net. But that same aspect simultaneously makes it so much easier to prototype and rapidly solve problems, so there’s a natural trade-off here.

Leave a Comment

Tips on a Post-NTLM Future

Rebecca Lewis wants to save you later heartache:

‘Informational message’ and ‘Further action is only required if Kerberos authentication is required’… For years, we could ignore this, but not anymore. Microsoft published a three-phase roadmap on January 28, 2026 to disable NTLM by default in upcoming Windows releases — and phase one is already live. That ‘informational message’ is about to become an action-item.

In most environments I audit, somewhere between 40 and 60 percent of Windows Authentication connections to SQL Server are running on NTLM — and nobody knew until we looked. If you have not checked yours, now is the time.

Read on to learn how you can check this, the kinds of problems that can prevent Kerberos authentication, and how you can fix them.

Leave a Comment

Adaptive Time Series Visualization in Microsoft Fabric

Devang Shah and Slava Trofimov show off a design pattern:

This design pattern provides intuitive, interactive Fabric-native experiences for any user:

  • Intelligent time binning: Handle billions of data points by automatically grouping them into optimal intervals.
  • Time brushing: Zoom in any period with drag-and-select interactions.
  • Multi-metric comparison: View multiple time series side by side across different assets.
  • Flexible aggregation: Switch between average, min, max, and sum with a single selection.
  • Anomaly detection: KQL queries detect unusual patterns in your time series with no ML expertise required.
  • Statistical insights: View descriptive statistics and correlations.
  • Contextualization: Bring asset hierarchies, tag metadata, and definitions directly into the report for richer interpretation.

Read on to learn more about the pattern and how it works. There are a lot of moving parts to get right, but the end result looks impressive.

Leave a Comment

Using a Microsoft Fabric Variable Library in a Dataflow

Laura Graham-Brown shows another way to use variable libraries:

One of the popular low-code tools within Microsoft Fabric is the Gen2 Dataflow. Power BI report builders already know some Power Query. So armed with this knowledge is a popular starting point to load data into Microsoft Fabric. Adding values from the Variable Library in a Dataflow is an obvious plan to make it more future proof and to work better with Deployment pipelines.

I will confess the first time I tried these I could not get them to work till I read the instructions correctly. So they do work just understand the limitations!

To be fair, following instructions is one of the most challenging things to do, it seems.

Leave a Comment

SQL Server Performance Monitor

Erik Darling announces a new open-source product:

That is, if you survive the installation process. Active Directory. Domain accounts. Firewalls. Security policies. Additional database servers. Annoying.

Or just send everything to the cloud, somewhere.

I got real tired of dealing with that, so I built a free, open-source SQL Server performance monitoring tool that isn’t any of those things, and doesn’t suck.

This does sound pretty neat, and the Lite version in particular is very intriguing in places where you can’t (or don’t want to) touch the servers.

1 Comment

Farewell, Purview Access Policies

Andreas Wolter gives us the behind-the-scenes view of things:

SQL Server 2025 was released at the end of last year. While most attention has gone to new features, this release also discontinued several existing ones—a notable first in recent SQL Server versions.

The full list is available here: Discontinued services and deprecated features

One item deserves particular attention: Purview access policies.

Read on to learn the intent behind this, the struggle that led to using Purview instead of Azure RBAC, and why there isn’t a solid alternative currently available.

Leave a Comment

Diving into DISTINCT

Louis Davidson is one of a kind:

If there is one SQL keyword that causes more fear than any other, it’s DISTINCT. When I see it in a query, I immediately start to worry about just how much work I am in for to ensure the correctness of that query. I start scanning for comments to describe why it is there, and if none are found, I know the query is probably going to be wrong.

I have seen DISTINCT used to hide bad joins, missing grouping, and even missing WHERE clauses. I have seen developers use it as a “fix-all” for data problems.

In this blog, I will look at the proper use and distinctly dangerous uses of DISTINCT and also show how you might test your query that uses DISTINCT to see what it is actually covering up.

Louis also includes one of my “favorite” coding errors: the accidental self-join. Done that one too many times to be proud of.

Leave a Comment