Press "Enter" to skip to content

Category: Testing

Getting CSV Row Counts

Dave Mason shares a few techniques for getting row counts of CSV files:

I was reminded of this recently as I was working with R, trying to read a nearly 2 GB data file. I wanted to read in 5% of the data and output it to a smaller file that would make the test code run faster. The particular function I was working with needed a row count as one of its parameters. For me, that meant I had to determine the number of rows in the source file and multiply by 0.05. I tied the code for all of those tasks into one script block.
Now, none to my surprise, it was slow. In my short experience, I’ve found R isn’t particularly snappy–even when the data can fit comfortably in memory. I was pretty sure I could beat R’s record count performance handily with C#. And I did. I found some related questions on StackOverflow. A small handful of answers discussed the efficiency of various approaches. I only tried two C# variations: my original attempt, and a second version that was supposed to be faster (the improvement was nominal).

To fact-check Dave (because this blog is about nothing other than making sure Dave is right), I checked the source code to the wc command. That command also streams through the entire file, so Dave’s premise looks good.

Comments closed

Testing Kafka Streams Applications

Yeva Byzek continues her series on testing Kafka-based streaming applications:

When you create a stream processing application with Kafka’s Streams API, you create a Topologyeither using the StreamsBuilder DSL or the low-level Processor API. Normally, the topology runs with the KafkaStreams class, which connects to a Kafka cluster and begins processing when you call start(). For testing though, connecting to a running Kafka cluster and making sure to clean up state between tests adds a lot of complexity and time.

Instead, developers can unit test their Kafka Streams applications with utilities provided by kafka-streams-test-utils. Introduced in KIP-247, this artifact was specifically created to help developers test their code, and it can be added into your continuous integration and continuous delivery (CI/CD) pipeline.

Streaming applications need tested just like any other.

Comments closed

The Difficulties Of Database Load Testing

Brent Ozar shares some of the trouble you might run into when database load testing:

Managers think that to simulate more load, they can just take the production queries and replay them multiple times, simultaneously, from the replay tool. We’ve already talked about how you can’t reliably replay deletes, but even inserts and updates cause a problem.
Say we’re load testing Stack Overflow queries, and our app does this:

UPDATE dbo.Users
  SET Reputation = Reputation + 1
  WHERE Id = 22656;

If try to simulate more load by running that exact same query from 100 different sessions simultaneously, we’re just going to end up with lock contention on that particular user’s row. We’ll be troubleshooting a blocking problem, not the problem we really have when 100 different users run that same query.

Click through for several issues you can run into and Brent’s advice.

Comments closed

Simulating Load With Powershell

Grant Fritchey builds a quick script to simulate load using PowerShell:

The idea is to be able to easily do one of several different things. By commenting out different sections of the code, I can change the general behavior. Most of the work is done in the  # Run forever section of the code.
First, I’ll randomly pick a modulus comparison. When that hits and the remainder is 0, then I randomly wait between 3 and 13 seconds. Clearly, any of these can be adjusted.
The query gets executed. Then, I have to options for dealing with the query in cache. I can clear cache on every execution. I’ve found this very useful when dealing with bad parameter sniffing (testing or generation). Or, I can use another random set of code to occasionally remove the procedure from cache.

Click through for the script and some more notes from Grant.

Comments closed

Building Test Data Following A Normal Distribution In T-SQL

I (finally) have a technical blog post:

In order to show you the solution, I want to build up a reasonable sized sample.  Any solution looks great when reading five records, but let’s kick that up a notch.  Or, more specifically, a million notches:  I’m going to use a CTE tally table and load 5 million rows.
I want some realistic looking data, so I’ve adapted Dallas Snider’s strategy to build a data set which approximates a normal distribution.
Because this is a little complicated, I wanted to take the time and explain the data load process in detail in its own post, and then apply it in the follow-up post.  We’ll start with a relatively small number of records for this demonstration:  50,000.  The reason is that you can generate 50K records almost instantaneously but once you start getting a couple orders of magnitude larger, things slow down some.

If you do custom data generation for lower environments, I’d recommend checking this out. Your production data probably doesn’t follow a normal distribution exactly, but a normal distribution is probably closer to reality than the uniform distribution you get with functions like RAND().

Comments closed

Reporting On Unit Tests In R With covrpage

Maelle Salmon recaps Locke Data’s involvement with the covrpage package:

To read more about getting started with covrpage in your own package in a few lines of code only, we recommend checking out the “get started” vignette. It explains more how to setup the Travis deploy, mentions which functions power the covrpage report, and gives more motivation for using covrpage.
And to learn how the information provided by covrpage should be read, read the “How to read the covrpage report” vignette.

Check it out.

Comments closed

Test Data Generation In SQL Server

Ahmad Yaseen walks through a couple techniques for creating test data in SQL Server:

Generating test data to fill the development database tables can also be performed easily and without wasting time for writing scripts for each data type or using third party tools. You can find various tools in the market that can be used to generate testing data. One of these wonderful tools is the dbForge Data Generator for SQL Server . It is a powerful GUI tool for a fast generation of meaningful test data for the development databases. dbForge data generation tool includes 200+ predefined data generators with sensible configuration options that allow you to emulate column-intelligent random data. The tool also allows generating demo data for SQL Server databases already filled with data and creating your own custom test data generators. dbForge Data Generator for SQL Server can save your time and effort spent on demo data generation by populating SQL Server tables with millions of rows of sample data that look just like real data. dbForge Data Generator for SQL Server helps to populate tables with most frequently used data types such as Basic, Business, Health, IT, Location, Payment and Person data types.

I have a love-hate relationship with test data generation tools, as they tend not to create reasonable data, where reasonable is a combination of domain (hi, birth date in the early 1800s!) and distribution.

Comments closed

Testing Package Properties With ssisUnit

Bartosz Ratajczyk shows how you can test certain properties on an Integration Services package using ssisUnit:

The command is simple. You can get or set the property using the value for given property path. As usual – when you get the value, you leave the value blank. The path – well – is the path to the element in the package or the project. You use backslashes to separate elements in the package tree, and at the end, you use .Properties[PropertyName] to read the property. If you use the elements collection – like connection managers – you can pick a single element using square brackets and the name of this element.

Read on for more, including limitations and useful testing scenarios.

Comments closed

Automated Testing With Power Query

Fred Kaffenberger walks us through query failure with Power Query:

I loved Nar’s post on Automated Testing using DAX. I especially like the rule of always including controls so that business readers can share responsibility for data quality. For my part, I sometimes use hidden pages in Power BI reports to assure myself of data quality. I also set alerts on testing dashboards in the Power BI Service to notify me if something is not right. Sometimes, however, a more proactive approach is needed. So, we’ll be doing automated testing with Power Query.

If the query can’t connect to the data source, it will fail. When this happens, the report in Power BI Service is stale, but accurate. I’m fine with thisIt can also happen that the query succeeds but is incomplete. In this case, the result is that the report is wrong. Why does this happen? It can happen because of an overtaxed transactional data source. The ERP or CRM or work order system just can’t deliver the amount of data. Maybe it’s linked SQL tables using ODBC. For whatever reason, the query succeeds, but data is missing. I’m NOT fine with this. The long-term solution is to move to a more reliable data source (data warehouse, anybody?). In the short run, refreshes must be stopped. Stale data is better than bad data.

Also check out the comments.

Comments closed

Packages For Testing R Packages

Maelle Salmon shows us how to test our R packages within R:

If you’re brand-new to unit testing your R package, I’d recommend reading this chapter from Hadley Wickham’s book about R packages.

There’s an R package called RUnit for unit testing, but in the whole post we’ll mention resources around the testthat package since it’s the one we use in our packages, and arguably the most popular one. testthat is great! Don’t hesitate to reads its docs again if you started using it a while ago, since the latest major release added the setup() and teardown() functions to run code before and after all tests, very handy.

To setup testing in an existing package i.e. creating the test folder and adding testthat as a dependency, run usethis::use_testthat(). In our WIP pRojects package, we set up the tests directory for you so you don’t forget. Then, in any case, add new tests for a function using usethis::use_test().

The testthis package might help make your testing workflow even smoother. In particular, test_this() “reloads the package and runs tests associated with the currently open R script file.”, and there’s also a function for opening the test file associated with the current R script.

This is an area where I know I need to get better, and Maelle gives us a plethora of tooling for tests.

Comments closed