Press "Enter" to skip to content

Category: Testing

Testing Python Code with pytest

Aida Gjoka builds some tests:

Testing code using automated tools is common throughout the software development industry. This technique can improve the quality of the code you write as a data scientist. Testing helps refine your code, supports redesign, prevents errors, and makes it harder to write single-use code.

Here, we introduce the pytest framework and show how it can be used to test Python functions. If you don’t use a testing framework as part of your daily workflow, try experimenting with the techniques presented here the next time you write or extend a function.

I am a big fan of pytest because it strikes what I consider to be a great balance between convention and customization. There’s very little administrative overhead to creating test classes and test cases, so tests are easy to build and can it’s trivial to run a test suite or a specific part of one.

Comments closed

Creating Test Classes and Unit Tests with tSQLt

Olivier Van Steenlandt continues a series on database testing:

We have set up our tSQLt Database Project in the previous data recipe, Create a SSDT Project Template based on your Database Project. Now it’s time to dive into the wonderful world of tSQLt Unit Testing. In the meantime, I have added my data warehouse to my SSDT Solution and added this project as a Database Reference to my Unit Testing Database Project. If you are unsure how to do this, you can find all the information you need in my previous data cookbook which you can access via the following link: Getting Started With Database Projects & Azure DevOps.

Read on for a walkthrough of how to do this.

Comments closed

Adding tSQLt to a Database Project

Olivier Van Steenlandt provides an overview of adding tSQLt to a Visual Studio database project:

As a first step in the process, we’re going to create a new Database Project, in my case, I will be calling my Database Project AdventureWorksDW_UnitTesting and my solution AdventureWorks.

If you are not sure how to set up a Database Project in Visual Studio from scratch, don’t worry, you can follow the step-by-step data recipe I released a while ago, Getting Started with Database Projects and Version Control

Read on to learn more about how to add the tSQLt objects and eliminate cross-database reference issues.

Comments closed

The Importance of Exploratory Testing

Thuy covers why exploratory testing is important:

Exploratory Testing is a software testing method that testers use to explore, find, and test features, bugs, or issues in an application freely and without the need for a prior testing plan. In exploratory testing, the tester will focus on freely working with the app as a real user and trying to find bugs and issues without following a specific test scenario.

Exploratory testing is a type of testing in which test cases are not created beforehand, but testers can test the system quickly. They can jot down ideas about what needs to be checked before performing the test. The focus of exploratory testing focuses more on testing as a “thinking” activity that explores new cases that do not follow the mainstream activity.

It’s amazing (and dismaying) how many bugs you can find simply by clicking around. The tricky part about exploratory testing is not actually finding bugs, but keeping track of your actions so that a developer knows how to fix the bugs you’ll inevitably find.

Comments closed

Unit Testing a Database

Olivier Van Steenlandt builds some tests:

In the past few years, I learned much about collaborative data warehouse development and deployment automatization by using Database Projects (SSDT) and Azure DevOps (and other tools).

I had my fair share of learning curves, making mistakes, and having great learning opportunities. Lately, I started my next journey to learn about Unit Testing for data warehousing/database development.

In this data cookbook (blog post series), we will discover the wonderful world and different flavors of unit testing from a data perspective. In the coming weeks/months, new data recipes (blog posts) will be released bi-weekly.

This first post provides an overview of the topic and includes links to three tools, though SQL Test is an implementation of tSQLt. Of the three, Visual Studio tests are the best of the bunch, though they’re more integration tests than unit tests.

Comments closed

Test Isolation with Kafka

Anton Belyaev builds some tests:

The experience of running Kafka in test scenarios has reached a high level of convenience thanks to the use of Test containers and enhanced support in Spring Boot 3.1 with the @ServiceConnection annotation. However, writing and maintaining integration tests with Kafka remains a challenge. This article describes an approach that significantly simplifies the testing process by ensuring test isolation and providing a set of tools to achieve this goal. With the successful implementation of isolation, Kafka tests can be organized in such a way that at the stage of result verification, there is full access to all messages that have arisen during the test, thereby avoiding the need for forced waiting methods such as Thread.sleep().

This method is suitable for use with Test containers, Embedded Kafka, or other methods of running the Kafka service (e.g., a local instance).

Click through for that approach.

Comments closed

Test Data Generation Concepts and Challenges

Phil Factor talks about data generation:

Data generation is the science and art of providing data for database development work that is as realistic and controllable as possible. The skills of generating realistic data are an essential part of being a database developer. It is important: you need plenty of data of exactly the right type, size and verisimilitude in order to develop and test a database. It has got to look right. If you are launching a corporate, data-driven application, you’ll need it for training as well as testing and development.

Where things get really tough is when you also want to get the distribution of data right. Is a uniform distribution (e.g., outputs from a random number generator) okay? Or do you have a few large accounts that take up the majority of your data? That difference could affect query plans and if you have nice and uniform data, you might never see some of the cardinality estimation problems until they pop up in production.

Comments closed

Database Subetting and Data Generation

Phil Factor tells us about two possibilities for loading a lower environment:

When dealing with the development, testing and releasing of new versions of an existing production database, developers like to use their existing production data. In doing so, the development team will be hit with the difficulties of managing and accommodating the large amount of storage used by a typical production database. It’s not a new problem because the practical storage capacity has grown over the years in line with our ingenuity in finding ways of using it.

To deal with using production data for testing, we generally want to reduce its size by extracting a subset of the entities from a ‘production’ database, anonymized and with referential integrity intact. We then deliver this subset to the various development environments.

Phil gets into some detail on the process behind subsetting and then covers data generation as an alternative.

Comments closed

Testing with Databricks

Anh Nguyen Viet shares some thoughts on testing in Databricks:

With diverse support and a focus on workspace uniformity, Databricks can bring many benefits to the testing process, such as the following:

  • Centralized: Databricks provides an integrated environment for many teams (including testing team also), allowing them to work focused and productive. Integrating tools and services in a single platform reduces fragmentation and increases efficiency during testing.
  • Consistency: Databricks offers integrated tools and services, allowing testers to work consistently across the entire testing process as a uniform and efficient working environment.
  • Enhanced Productivity and Cost Reduction: With the flexibility and efficiency in data processing supported by DataBricks, testers can save time and effort, thereby increasing work productivity and reducing project costs. Utilizing utilities properly helps automate the testing process and delivers better results.

Read on for a few tips around building tests using Databricks.

Comments closed

Mock Tests for R Packages

Maelle Salmon does a bit of mocking:

This blog featured a post on mocking, the art of replacing a function with whatever fake we need for testing, years ago. Since then, we’ve entered a new decade, the second edition of Hadley Wickham’s and Jenny Bryan’s R packages book was published, and mocking returned to testthat, so it’s time for a new take/resources roundup!

Click through to see how you can create mocks in R as well as some practical examples of mocks in action.

Comments closed