Press "Enter" to skip to content

Category: Testing

Snapshot Testing in R

Jakub Sobolewski drills into a particular form of testing:

Snapshot testing is not about screenshots.

Most people meet it through UI regression tests: render a component, save a picture, fail the build when the picture changes. So the technique gets filed away as “the thing that compares images.” That is one use. But not the only one.

The mechanic underneath is general. Capture some output, save it to a file, and on every later run compare fresh output against the saved copy. The output can be a plot. It can also be console text, a log, a data frame, an error message, or a deeply nested list. Anything you can serialize, you can snapshot.

Read on to see how you can perform snapshot testing, using examples in R to demonstrate. H/T R-Bloggers.

Leave a Comment

Trusting Outputs from Fabric Data Agents

Jens Vestergaard says don’t trust, do verify:

In two previous posts I went down the path of getting a semantic model ready for AI: descriptions on every measure, an instructions file, the schema tidied up enough that a Fabric data agent has something real to read. That work has a satisfying endpoint. The model looks ready.

Ready is not the same as right.

The kind of evaluation Jens is talking about is fundamental to good business intelligence practices, regardless of whether you throw language models into the mix. Where language models do add complexity is the arbitrary scope of questions, how ambiguous people tend to be when writing, and the stochastic nature of answers. All of that makes the problem harder, though at least it isn’t an entirely different class of problem to solve.

Leave a Comment

User-Defined Functions and Power BI Testing

John Kerski is excited:

User Defined Functions (UDFs) are, in my opinion, the biggest update to Power BI Desktop since PBIP.

That may sound dramatic, but if you care about DataOps, semantic model quality, and reusable development patterns, UDFs fundamentally change what is possible with DAX.

Reuse is one of the core principles of DataOps. For years we have been able to build reusable patterns in Power Query, PowerShell, Python, YAML, and infrastructure automation. But DAX was always missing a key capability: reusable logic that could live inside the semantic model itself.

Until now.

Read on to learn more, as well as to get a link to John’s PQL.Assert DAX unit testing library.

Leave a Comment

Testing SQL Code in Python

Jamal Hansen writes some tests:

I once had a query that ran fine for months. Then someone added a column to the source table and a SELECT * downstream started returning unexpected data. The query didn’t error. It just silently gave wrong results. A test would have caught it immediately.

Schema changes break queries silently. Refactoring a CTE can shift results in ways you don’t notice. New data patterns expose assumptions you didn’t know you made. SQL deserves the same testing discipline as the rest of your code, and Python makes it straightforward.

PyTest, the library Jamal uses here, is one of my favorites for this kind of work. You can build up tests without a lot of ceremony and it’s pretty easy to deal with for most use cases.

Comments closed

Building Automated Tests in Power BI

John Kerski has a use for user-defined functions:

Reuse is a very important term in DataOps. It is defined as the practice of leveraging existing components, code, or processes across multiple projects to reduce redundancy and improve consistency.

However, when it comes to Power BI, reusing DAX measures across projects was a difficult ‘copy and paste’ job. For my teams, we used DAX measures to help with testing our semantic models, but ensuring consistent testing conventions (and standard schemas of the tests) required lots of manual review.

Thankfully, that changed in late 2025 when Microsoft introduced User Defined Functions (UDFs) for Power BI. In this article, I’ll demonstrate how to use UDFs for testing, plus how to standardize the way teams test their models.

Click through to see how.

Comments closed

Dimensional Testing in Kafka

Jack Vanlightly announces a new tool:

Most of my career in distributed systems has been as a tester, performance engineer and formal verification specialist. I’ve written performance benchmarking tools in the past, for RabbitMQ and Apache Pulsar but in recent years I’ve used OpenMessagingBenchmark (OMB) to run benchmarks against Apache Kafka and other messaging systems. But OMB is hard to deploy and has several limitations compared to more sophisticated benchmarking systems I’ve developed in the past. With Claude becoming so much better since Christmas I decided to write a Kafka-centric performance benchmarking tool, with a lot of inspiration from OMB. I took the bits I like about OMB and the things I like about the tooling I’ve built in the past, to make a performance testing tool for testing Apache Kafka.

Click through for an overview of the tool and how it works.

Comments closed

Migrating testthat to testit

Yihui Xie explains how to switch test frameworks in R:

Back in 2013, I wrote about testing R packages when I first released testit. Thirteen years later, I still believe that unit testing should be nothing more than “tell me if something unexpected happened.” Recently I converted a large testthat test suite to testit, and I thought I’d share a practical guide for anyone considering the same move.

Click through for that guide.

Comments closed

Storage Load Testing SQL Server

Anthony Nocentino designs a test:

I’ve been doing storage load tests for SQL Server for a long time, both as a consultant and now in my work at Everpure, and I see the same patterns over and over. Someone spins up a VM with two vCPUs, points it at a storage subsystem (cloud or on-prem), runs a thousand threads at it, and then concludes that the storage stinks. Or the opposite, where they buy a 64 gigabit HBA, plug it into the wrong PCIe slot, and wonder why they’re leaving half of the capacity on the table.

Designing a good load test isn’t about pushing the biggest number you can find. It’s about knowing what your gear is supposed to do, driving load until you find the point where the system breaks down, and measuring at the right places along the way. In this post, I’m going to walk you through how I think about designing a storage load test, the tools I reach for, and the pitfalls that will invalidate your results if you’re not paying attention.

Click through for Anthony’s thought process.

Comments closed

Testing Your SQL Server Backups

Vlad Drumea stresses the need to test:

Folks think that if they just take backups regularly, and their backup intervals match their recovery point objectives, then they can rest assured knowing (more like falsely thinking) they have backups from which to restore in case something bad happens to a database or to the entire instance.

Yeah, there are several things that can go wrong, even beyond the solid answers Vlad provides. Think about developers taking one-off backups that break the log chain, overly-aggressive purge processes that delete backups before they’re fully moved to long-term storage, etc.

Definitely check out Vlad’s post. If you’re interested, I also have a video series on YouTube that covers SQL Server backup and restore.

Comments closed

Unit Testing DAX via Semantic Link

Jens Vestergaard writes a test:

Every BI developer has felt it. You change a measure, update a relationship, or rename a column in a semantic model, and then you spend the next hour clicking through report pages to check if something broke. Manual spot-checking is how most teams validate DAX today. It works until it does not.

I have been building and maintaining semantic models for years. The further I get into Fabric-based development, the more my models start to feel like production code. They power dashboards that drive decisions. They feed downstream pipelines. When something breaks, the blast radius is real. And yet, the testing story has always been: deploy, open the report, squint at the numbers.

That gap bothered me enough to do something about it.

Click through to see what Jens has done.

Comments closed