Press "Enter" to skip to content

Category: Testing

Why Unit Testing in the Database Is Tough

Rob Farley talks about a couple of reasons why database unit testing can be difficult to do:

Hamish wants to develop a conversation about unit testing within database because he recognises that the lack of unit testing is a significant problem. It’s quite commonplace in the world of iterative code, of C#, Java, and those kinds of languages, but a lot less commonplace in the world of data. I’m going to look at two of the reasons why I think this is.

Read Rob’s thoughts in their entirety. I fully agree that we need to test, but get wishy-washy on the topic of automated testing. The reason for that is that tooling is quite limited, and many of those limitations are inherent limits in the database platform itself. For the types of things you most need to test (like hefty stored procedures), the number of test cases spirals out of control quickly. And unlike functional or structured programming languages, T-SQL performance gets markedly worse as you modularize, which makes it so difficult to get down to an easily testable block of code.

Comments closed

Replaying Workloads to a Different Database with WorkloadTools

Gianluca Sartori takes us through workload replay ability in WorkloadTools:

One of the features I was asked to implement for WorkloadTools is the ability to replay commands to a database name different from the one recorded in the source workload.

This is something that I had been planning to implement for a while and it totally makes sense. Usually, you have two identical environments for the workload capture and replay, both with the same databases. Sometimes it makes sense to have two different databases as the source and target for the workload, for some particular reasons: resources constraints, ease of testing and so on.

WorkloadTools now supports replaying commands to a different database, using the DatabaseMap property of the ReplayConsumer.

Setting this up is pretty simple, though Gianluca does lay out a caveat.

Comments closed

Preparing for Demos with Pester

Jess Pomfret has some advice for you if you ever give a presentation:

If you don’t know what Pester is, it’s a test framework for PowerShell.  In the simplest explanation, using their Domain-Specific Language (DSL) you describe how things should look. If all looks good it returns output in green and if it doesn’t you get red output.  There are a lot of great use cases for Pester, like using it to ensure your code does what it’s supposed to, using it to validate your SQL Server environment (dbachecks), or in this example using it to make sure your demos are setup and ready to go.

When I’m preparing for a presentation I go through the demos over and over again, so it’s easy to accidentally leave things in a state that will cause issues when I go to do my demos in the presentation. If you’re creating a table, for example, during the demo and you already created it practicing and then forgot to drop it, the demo gods will strike and it’ll fail when it matters most! A simple Pester test to check whether the table exists will solve this issue.

Even if you aren’t giving talks in public (or inside your company), Pester is a useful tool for ensuring that the thing you expect to be the case actually is the case.

Comments closed

Using Pester with .NET Powershell Notebooks

Rob Sewell has Powershell in notebooks, so of course Rob is going to write tests:

Using Pester to validate that an environment is as you expect it is a good resource for incident resolution, potentially enabling you to quickly establish an area to concentrate on for the issue. However, if you try to run Pester in a .NET Notebook you will receive an error

Click through for the reason why this error appears and a workaround until it’s fixed for real.

Comments closed

Unit Testing in Apache Flink

Kartik Khare has a guide to help us write unit tests for our Apache Flink code:

Writing tests for process functions, that work with time, is quite similar to writing tests for stateful functions because you can also use test harness. However, you need to take care of another aspect, which is providing timestamps for events and controlling the current time of the application. By setting the current (processing or event) time, you can trigger registered timers, which will call the onTimer method of the function

Click through for demos and more details on the test harness.

Comments closed

High-Value Software Testing

RJ Zaworski has some principles around high-value testing:

Whatever the tools and approach, software testing proves that software does what it claims to do. Tests help developers eliminate defects, build confidence, practice good design, and ideally all three. They also take time to write, run, and update–time that’s no longer available for other development tasks.

High-value testing seeks to maximize the return on that investment. Like much of software development, it’s as much art as science. But a few practical principles can help keep things pointed in the right direction.

There’s a lot of good advice in this post.

Comments closed

Mocking Objects with R

The R-hub blog has an interesting post on creating mocks in R for unit testing:

In some of these cases, the programming concept you’re after is mocking, i.e. making a function act as if something were a certain way! In this blog post we shall offer a round-up of resources around mocking, or not mocking, when unit testing an R package.

It’s interesting watching data scientists work through the same sorts of problems which traditional developers have hit, whether that be testing, deployment, or source control management. H/T R-bloggers

Comments closed

Scaling Out Continuous Integration

Chris Adkin shows off parallelism in Azure DevOps continuous integration pipelines:

A SQL Server data tools project is checked out of GitHub, built into a DacPac, four containerized SQL Server instances are spun up using clones of the ‘Seed’ docker volume. The DacPac is applied to a database running inside each container, which a tSQLt test is then executed against, finally, at the end very end the tSQLt results are aggregate and published.

This is an interesting approach to the problem of lengthy tests: run them on several separate machines concurrently.

Comments closed

Proving ETL Correctness

Ed Elliott shares a few techniques for testing ETL processes:

Reconciliation is the process of going to your source system, getting a number and validating that number on the target. This ranges from being easy to impossible, so you need to decide what to reconcile on a case by case basis.

In its simplest form, we can go to a source system and find out things like how many records are to be copied, sum up totals and run other aggregations that we can then validate as correct (or not!) on the target system.

Ed has put together a thoughtful approach to validating data loads regardless of the source.

Comments closed

Testing ETL Pipelines

Ed Elliott has started a new series on testing ETL pipelines:

We test in production, this means we have monitoring and do things like have phased roll-outs using feature flags, or we roll-out to select customers first, prove it then roll it out to everyone else. Testing in production doesn’t mean hacking around getting some process to work. We don’t test “on production” (hacking), we test “in production” – while we are in production we are continually testing, and if anything goes wrong, we have alerts and can deal with it.

Testing pipelines feels difficult because there are so many moving pieces, but if you design for testability (e.g., being able to tee off samples of data, send test records through, etc.), things get easier.

Comments closed