Press "Enter" to skip to content

Category: Testing

Data Quality Management with Great Expectations and Databricks

Sairamakrishna BuchiReddy Karri and Srinivasarao Rayankula show off Great Expectations:

Data quality checks are critical for any production pipeline. While there are many ways to implement them, the Great Expectations library is a popular one. 

Great Expectations is a powerful tool for maintaining data quality by defining, managing, and validating expectations for your data. In this article, we will discuss how you can use it to ensure data quality in your data pipelines.

Click through to see how it all works.

Leave a Comment

An Overview of DataDiluvium

Adron Hall has a new tool and a new blog series. The first post is a product overview:

DataDiluvium is a web-based tool available at datadiluvium.com that helps developers, database administrators, and data engineers generate realistic test data from SQL schema definitions. Whether you’re setting up a development environment, creating test scenarios, or preparing data for demonstrations, DataDiluvium streamlines the process of data generation.

The second covers some of the development precepts Adron used:

DataDiluvium is a web-based tool I’ve built designed to help developers, database administrators, and data engineers generate realistic test data based on SQL schema definitions. The tool takes SQL table definitions as input and produces sample data in various formats, making it easier to populate development and testing environments with meaningful data.

The tool is free, so if you’re looking for a sample data generator, check it out.

Leave a Comment

Testing a SQL Server Operation with a Container

Jess Pomfret performs a test:

Today, my colleague wanted to quickly test out some dbatools commands to install the Ola Hallengren maintenance solution. They had a local instance of SQL installed, but it already had the maintenance jobs running, so it wasn’t a fresh, out of the box instance.

So let’s spin a SQL Server instance in seconds to test against! (Ok it’s seconds if you have the pre-requisites installed, but I’ll get you setup in a few minutes if not)!

Click through for a primer on using SQL Server in a container.

Comments closed

Streaming Data to Azure Event Hub via Mockaroo and Kafka API

Jasleen Kaur Wahi generates some data:

In a recent project, I faced the need to generate randomized data for transmission to the Azure Event Hub. This hub is a key component of Microsoft Azure, used for real-time data ingestion and processing.

First, let’s take look at how I created this random data. I wanted to come up with a way to make data that looks like what we see in the real world, but without using any real information from users. This made-up data was really important for a bunch of things, like checking if our software works well.

Read on to see how Mockaroo works and the end result. Creating tests for streaming services like Event Hubs is a challenge, so this is an interesting approach to the task.

Comments closed

Automating tSQLt Test Execution via Azure DevOps Pipeline

Olivier Van Steenlandt automates some tests:

Before we dive into “how to automate”, I would like to explain why bothering about Unit Test Automation. Unit Tests in general are a good practice to identify potential issues with new source code upfront. Therefore, executing Unit Tests is a good idea. By automating Unit Tests, developers can be sure that Unit Tests will be executed for every single deployment that happens.

In the upcoming step-by-step guide, Unit Test automation will be implemented in a YAML pipeline. The pipelines I’m using are very simplified in comparison with many other pipelines I have seen in the past.

Read on for the process.

Comments closed

API Testing with pytest

Xuan Nguyen Truong writes some tests:

API testing is an essential aspect of software development, ensuring that your application’s endpoints are functioning correctly and reliably. In this guide, we’ll introduce you to implement API testing in Python with Pytest and the Requests library.

I’m a big fan of pytest, as it makes testing in Python so much easier. There’s not a lot of ceremony involved in writing tests and it’s easy to see what’s failing during tests.

Comments closed

The Cost of Free Testing Tools

Lauren Harold reminds us that time has value and opportunity cost exists:

Open source frameworks like Selenium, Cypress, and Playwright are the first stop for most startups when they automate their end-to-end tests. This process generally happens early, according to our research, with 69% of software teams with just 6-10 developers automating their testing.

But make no mistake. We recently surveyed developers and engineering leaders from around the globe to understand how current practices impact software velocity. One thing that immediately became apparent was that open source testing frameworks are “free,” but their costs snowball once you start adding testing infrastructure, plugins, and test management software.

The proposed solution in the article is no-code end-to-end testing, though that often ends up being brittle like other end-to-end testing solutions, and a huge challenge for complicated flows.

Comments closed

Finding a Good Cost Threshold for Parallelism

Jared Westover goes on a quest:

Given modern hardware, you might hear that the default setting of 5 for the Cost Threshold for Parallelism (CTFP) is far too low. However, people are left with a decision: Should they change it or leave it alone? If I change it and the performance gets worse, I’ll be left with egg on my face. What exactly is the benefit of increasing it, especially for smaller-cost queries?

Read on to learn more about what Cost Threshold for Parallelism is, how you can set it, and a simple example of how the setting can affect you. Jared also has some links to great resources that I highly recommend you check out.

Comments closed