Press "Enter" to skip to content

Category: Testing

Unit Testing in Apache Flink

Kartik Khare has a guide to help us write unit tests for our Apache Flink code:

Writing tests for process functions, that work with time, is quite similar to writing tests for stateful functions because you can also use test harness. However, you need to take care of another aspect, which is providing timestamps for events and controlling the current time of the application. By setting the current (processing or event) time, you can trigger registered timers, which will call the onTimer method of the function

Click through for demos and more details on the test harness.

Comments closed

High-Value Software Testing

RJ Zaworski has some principles around high-value testing:

Whatever the tools and approach, software testing proves that software does what it claims to do. Tests help developers eliminate defects, build confidence, practice good design, and ideally all three. They also take time to write, run, and update–time that’s no longer available for other development tasks.

High-value testing seeks to maximize the return on that investment. Like much of software development, it’s as much art as science. But a few practical principles can help keep things pointed in the right direction.

There’s a lot of good advice in this post.

Comments closed

Mocking Objects with R

The R-hub blog has an interesting post on creating mocks in R for unit testing:

In some of these cases, the programming concept you’re after is mocking, i.e. making a function act as if something were a certain way! In this blog post we shall offer a round-up of resources around mocking, or not mocking, when unit testing an R package.

It’s interesting watching data scientists work through the same sorts of problems which traditional developers have hit, whether that be testing, deployment, or source control management. H/T R-bloggers

Comments closed

Scaling Out Continuous Integration

Chris Adkin shows off parallelism in Azure DevOps continuous integration pipelines:

A SQL Server data tools project is checked out of GitHub, built into a DacPac, four containerized SQL Server instances are spun up using clones of the ‘Seed’ docker volume. The DacPac is applied to a database running inside each container, which a tSQLt test is then executed against, finally, at the end very end the tSQLt results are aggregate and published.

This is an interesting approach to the problem of lengthy tests: run them on several separate machines concurrently.

Comments closed

Proving ETL Correctness

Ed Elliott shares a few techniques for testing ETL processes:

Reconciliation is the process of going to your source system, getting a number and validating that number on the target. This ranges from being easy to impossible, so you need to decide what to reconcile on a case by case basis.

In its simplest form, we can go to a source system and find out things like how many records are to be copied, sum up totals and run other aggregations that we can then validate as correct (or not!) on the target system.

Ed has put together a thoughtful approach to validating data loads regardless of the source.

Comments closed

Testing ETL Pipelines

Ed Elliott has started a new series on testing ETL pipelines:

We test in production, this means we have monitoring and do things like have phased roll-outs using feature flags, or we roll-out to select customers first, prove it then roll it out to everyone else. Testing in production doesn’t mean hacking around getting some process to work. We don’t test “on production” (hacking), we test “in production” – while we are in production we are continually testing, and if anything goes wrong, we have alerts and can deal with it.

Testing pipelines feels difficult because there are so many moving pieces, but if you design for testability (e.g., being able to tee off samples of data, send test records through, etc.), things get easier.

Comments closed

Finding Broken Code in SQL Server

Pamela Mooney shows us how we can find broken code on our SQL Server instances:

Before we approached our last major SQL Server upgrade, I was curious about what might break.  Yes, I had used the DEA to check our code against deprecated or discontinued code.  But I am talking about code that might not have been used in some time and would break because objects no longer existed, or other things like that.  So I wrote these scripts to refresh the sprocs, views and functions in our (non-production) environment.

Pamela’s motivation was to handle code which breaks during an update. You can also use this to see what you can probably deprecate—if the view doesn’t work, it can’t be in use. That means either someone should fix it or drop it.

Comments closed

Pester and Testable Powershell Scripts

Shane O’Neill has a public service announcement:

Now scripts are notoriously hard to test, I’ve written about how I’ve done that before but, honestly, if you really want to know then you need to check out Jakub Jares ( blog | twitter ).

Knowing how difficult testing scripts are, the first thing I decided to do was take the functions in the script and split them out. This way they can be abstracted away and tested safely.

I also didn’t want to take on too much at one time so I choose a random function, GetPreviousTag, and only actioned that one first.

Shane also found a bug in the first script, underscoring the importance of good tests.

Comments closed

Replaying Workloads with WorkloadTools

Gianluca Sartori shows an example of using the WorkloadTools application to replay a workload, including where the analytics server cannot directly access the production database:

Regardless of the method that you decided to use, at the end of the replays, you will have two distinct sets of tables containing the workload analysis data, sitting in different schemas in the same database or in completely different databases.

WorkloadViewer will let you visualize performance over time, as we have seen for a single workload analysis, but this time it will be able to show you data from both workloads, so that you can compare them.

This sort of production load testing is both important and difficult; WorkloadTools makes it easier.

Comments closed

Load Testing Databases

Ed Elliott shares some tips on load testing a database:

Testing database performance is hard and takes a great deal of work to probably not even do particularly well. Instead of thinking about how you can load test a database think about how you can drive the application.

For instance, if you have a web app then use JMeter to simulate load. If you have a “fat app”, then you might need to write some code to call specific workflows through the application.

Click through for good advice, particularly around what you should not do.

Comments closed