Press "Enter" to skip to content

Category: Testing

Testing Kafka Messages with RecordCaptor

Anton Belyaev shows off an open-source utility:

Let’s take a Telegram bot that forwards requests to the OpenAI API and returns the result to the user as an example. If the request to OpenAI violates the system’s security rules, the client will be notified. Additionally, a message will be sent to Kafka for the behavioral control system so that the manager can contact the user, explain that their request was too sensitive even for our bot, and ask them to review their preferences.

The interaction contracts with services are described in a simplified manner to emphasize the core logic. Below is a sequence diagram demonstrating the application’s architecture. I understand that the design may raise questions from a system architecture perspective, but please approach it with understanding — the main goal here is to demonstrate the approach to writing tests.

Read on to see how it all works, as well as links to Anton’s GitHub repo for testing in Kafka.

Leave a Comment

Building a Test Data Generator for PostgreSQL

Mika Sutinen builds some data:

I recently had a project where I needed quickly to generate some realistic looking test data to PostgreSQL database. While I often like to go for ready-made solutions, this felt like a good opportunity to stretch my coding muscles and develop it myself. Moreover, this seemed like a fun puzzle to solve, and I could probably use the same solution later on elsewhere.

Click through for a description of the generator, as well as a link to Mika’s GitHub repo. Taking a quick peek at it, it does appear that you could probably use this for other data platforms like SQL Server with very limited modification.

Leave a Comment

Testing Python Code with pytest

Aida Gjoka builds some tests:

Testing code using automated tools is common throughout the software development industry. This technique can improve the quality of the code you write as a data scientist. Testing helps refine your code, supports redesign, prevents errors, and makes it harder to write single-use code.

Here, we introduce the pytest framework and show how it can be used to test Python functions. If you don’t use a testing framework as part of your daily workflow, try experimenting with the techniques presented here the next time you write or extend a function.

I am a big fan of pytest because it strikes what I consider to be a great balance between convention and customization. There’s very little administrative overhead to creating test classes and test cases, so tests are easy to build and can it’s trivial to run a test suite or a specific part of one.

Leave a Comment

Creating Test Classes and Unit Tests with tSQLt

Olivier Van Steenlandt continues a series on database testing:

We have set up our tSQLt Database Project in the previous data recipe, Create a SSDT Project Template based on your Database Project. Now it’s time to dive into the wonderful world of tSQLt Unit Testing. In the meantime, I have added my data warehouse to my SSDT Solution and added this project as a Database Reference to my Unit Testing Database Project. If you are unsure how to do this, you can find all the information you need in my previous data cookbook which you can access via the following link: Getting Started With Database Projects & Azure DevOps.

Read on for a walkthrough of how to do this.

Comments closed

Adding tSQLt to a Database Project

Olivier Van Steenlandt provides an overview of adding tSQLt to a Visual Studio database project:

As a first step in the process, we’re going to create a new Database Project, in my case, I will be calling my Database Project AdventureWorksDW_UnitTesting and my solution AdventureWorks.

If you are not sure how to set up a Database Project in Visual Studio from scratch, don’t worry, you can follow the step-by-step data recipe I released a while ago, Getting Started with Database Projects and Version Control

Read on to learn more about how to add the tSQLt objects and eliminate cross-database reference issues.

Comments closed

The Importance of Exploratory Testing

Thuy covers why exploratory testing is important:

Exploratory Testing is a software testing method that testers use to explore, find, and test features, bugs, or issues in an application freely and without the need for a prior testing plan. In exploratory testing, the tester will focus on freely working with the app as a real user and trying to find bugs and issues without following a specific test scenario.

Exploratory testing is a type of testing in which test cases are not created beforehand, but testers can test the system quickly. They can jot down ideas about what needs to be checked before performing the test. The focus of exploratory testing focuses more on testing as a “thinking” activity that explores new cases that do not follow the mainstream activity.

It’s amazing (and dismaying) how many bugs you can find simply by clicking around. The tricky part about exploratory testing is not actually finding bugs, but keeping track of your actions so that a developer knows how to fix the bugs you’ll inevitably find.

Comments closed

Unit Testing a Database

Olivier Van Steenlandt builds some tests:

In the past few years, I learned much about collaborative data warehouse development and deployment automatization by using Database Projects (SSDT) and Azure DevOps (and other tools).

I had my fair share of learning curves, making mistakes, and having great learning opportunities. Lately, I started my next journey to learn about Unit Testing for data warehousing/database development.

In this data cookbook (blog post series), we will discover the wonderful world and different flavors of unit testing from a data perspective. In the coming weeks/months, new data recipes (blog posts) will be released bi-weekly.

This first post provides an overview of the topic and includes links to three tools, though SQL Test is an implementation of tSQLt. Of the three, Visual Studio tests are the best of the bunch, though they’re more integration tests than unit tests.

Comments closed

Test Isolation with Kafka

Anton Belyaev builds some tests:

The experience of running Kafka in test scenarios has reached a high level of convenience thanks to the use of Test containers and enhanced support in Spring Boot 3.1 with the @ServiceConnection annotation. However, writing and maintaining integration tests with Kafka remains a challenge. This article describes an approach that significantly simplifies the testing process by ensuring test isolation and providing a set of tools to achieve this goal. With the successful implementation of isolation, Kafka tests can be organized in such a way that at the stage of result verification, there is full access to all messages that have arisen during the test, thereby avoiding the need for forced waiting methods such as Thread.sleep().

This method is suitable for use with Test containers, Embedded Kafka, or other methods of running the Kafka service (e.g., a local instance).

Click through for that approach.

Comments closed

Test Data Generation Concepts and Challenges

Phil Factor talks about data generation:

Data generation is the science and art of providing data for database development work that is as realistic and controllable as possible. The skills of generating realistic data are an essential part of being a database developer. It is important: you need plenty of data of exactly the right type, size and verisimilitude in order to develop and test a database. It has got to look right. If you are launching a corporate, data-driven application, you’ll need it for training as well as testing and development.

Where things get really tough is when you also want to get the distribution of data right. Is a uniform distribution (e.g., outputs from a random number generator) okay? Or do you have a few large accounts that take up the majority of your data? That difference could affect query plans and if you have nice and uniform data, you might never see some of the cardinality estimation problems until they pop up in production.

Comments closed

Database Subetting and Data Generation

Phil Factor tells us about two possibilities for loading a lower environment:

When dealing with the development, testing and releasing of new versions of an existing production database, developers like to use their existing production data. In doing so, the development team will be hit with the difficulties of managing and accommodating the large amount of storage used by a typical production database. It’s not a new problem because the practical storage capacity has grown over the years in line with our ingenuity in finding ways of using it.

To deal with using production data for testing, we generally want to reduce its size by extracting a subset of the entities from a ‘production’ database, anonymized and with referential integrity intact. We then deliver this subset to the various development environments.

Phil gets into some detail on the process behind subsetting and then covers data generation as an alternative.

Comments closed