Press "Enter" to skip to content

Category: Testing

Simulating Load With Powershell

Grant Fritchey builds a quick script to simulate load using PowerShell:

The idea is to be able to easily do one of several different things. By commenting out different sections of the code, I can change the general behavior. Most of the work is done in the  # Run forever section of the code.
First, I’ll randomly pick a modulus comparison. When that hits and the remainder is 0, then I randomly wait between 3 and 13 seconds. Clearly, any of these can be adjusted.
The query gets executed. Then, I have to options for dealing with the query in cache. I can clear cache on every execution. I’ve found this very useful when dealing with bad parameter sniffing (testing or generation). Or, I can use another random set of code to occasionally remove the procedure from cache.

Click through for the script and some more notes from Grant.

Comments closed

Building Test Data Following A Normal Distribution In T-SQL

I (finally) have a technical blog post:

In order to show you the solution, I want to build up a reasonable sized sample.  Any solution looks great when reading five records, but let’s kick that up a notch.  Or, more specifically, a million notches:  I’m going to use a CTE tally table and load 5 million rows.
I want some realistic looking data, so I’ve adapted Dallas Snider’s strategy to build a data set which approximates a normal distribution.
Because this is a little complicated, I wanted to take the time and explain the data load process in detail in its own post, and then apply it in the follow-up post.  We’ll start with a relatively small number of records for this demonstration:  50,000.  The reason is that you can generate 50K records almost instantaneously but once you start getting a couple orders of magnitude larger, things slow down some.

If you do custom data generation for lower environments, I’d recommend checking this out. Your production data probably doesn’t follow a normal distribution exactly, but a normal distribution is probably closer to reality than the uniform distribution you get with functions like RAND().

Comments closed

Reporting On Unit Tests In R With covrpage

Maelle Salmon recaps Locke Data’s involvement with the covrpage package:

To read more about getting started with covrpage in your own package in a few lines of code only, we recommend checking out the “get started” vignette. It explains more how to setup the Travis deploy, mentions which functions power the covrpage report, and gives more motivation for using covrpage.
And to learn how the information provided by covrpage should be read, read the “How to read the covrpage report” vignette.

Check it out.

Comments closed

Test Data Generation In SQL Server

Ahmad Yaseen walks through a couple techniques for creating test data in SQL Server:

Generating test data to fill the development database tables can also be performed easily and without wasting time for writing scripts for each data type or using third party tools. You can find various tools in the market that can be used to generate testing data. One of these wonderful tools is the dbForge Data Generator for SQL Server . It is a powerful GUI tool for a fast generation of meaningful test data for the development databases. dbForge data generation tool includes 200+ predefined data generators with sensible configuration options that allow you to emulate column-intelligent random data. The tool also allows generating demo data for SQL Server databases already filled with data and creating your own custom test data generators. dbForge Data Generator for SQL Server can save your time and effort spent on demo data generation by populating SQL Server tables with millions of rows of sample data that look just like real data. dbForge Data Generator for SQL Server helps to populate tables with most frequently used data types such as Basic, Business, Health, IT, Location, Payment and Person data types.

I have a love-hate relationship with test data generation tools, as they tend not to create reasonable data, where reasonable is a combination of domain (hi, birth date in the early 1800s!) and distribution.

Comments closed

Testing Package Properties With ssisUnit

Bartosz Ratajczyk shows how you can test certain properties on an Integration Services package using ssisUnit:

The command is simple. You can get or set the property using the value for given property path. As usual – when you get the value, you leave the value blank. The path – well – is the path to the element in the package or the project. You use backslashes to separate elements in the package tree, and at the end, you use .Properties[PropertyName] to read the property. If you use the elements collection – like connection managers – you can pick a single element using square brackets and the name of this element.

Read on for more, including limitations and useful testing scenarios.

Comments closed

Automated Testing With Power Query

Fred Kaffenberger walks us through query failure with Power Query:

I loved Nar’s post on Automated Testing using DAX. I especially like the rule of always including controls so that business readers can share responsibility for data quality. For my part, I sometimes use hidden pages in Power BI reports to assure myself of data quality. I also set alerts on testing dashboards in the Power BI Service to notify me if something is not right. Sometimes, however, a more proactive approach is needed. So, we’ll be doing automated testing with Power Query.

If the query can’t connect to the data source, it will fail. When this happens, the report in Power BI Service is stale, but accurate. I’m fine with thisIt can also happen that the query succeeds but is incomplete. In this case, the result is that the report is wrong. Why does this happen? It can happen because of an overtaxed transactional data source. The ERP or CRM or work order system just can’t deliver the amount of data. Maybe it’s linked SQL tables using ODBC. For whatever reason, the query succeeds, but data is missing. I’m NOT fine with this. The long-term solution is to move to a more reliable data source (data warehouse, anybody?). In the short run, refreshes must be stopped. Stale data is better than bad data.

Also check out the comments.

Comments closed

Packages For Testing R Packages

Maelle Salmon shows us how to test our R packages within R:

If you’re brand-new to unit testing your R package, I’d recommend reading this chapter from Hadley Wickham’s book about R packages.

There’s an R package called RUnit for unit testing, but in the whole post we’ll mention resources around the testthat package since it’s the one we use in our packages, and arguably the most popular one. testthat is great! Don’t hesitate to reads its docs again if you started using it a while ago, since the latest major release added the setup() and teardown() functions to run code before and after all tests, very handy.

To setup testing in an existing package i.e. creating the test folder and adding testthat as a dependency, run usethis::use_testthat(). In our WIP pRojects package, we set up the tests directory for you so you don’t forget. Then, in any case, add new tests for a function using usethis::use_test().

The testthis package might help make your testing workflow even smoother. In particular, test_this() “reloads the package and runs tests associated with the currently open R script file.”, and there’s also a function for opening the test file associated with the current R script.

This is an area where I know I need to get better, and Maelle gives us a plethora of tooling for tests.

Comments closed

Generating Load For Kafka With JMeter

Anup Shirolkar shows us a way to use JMeter to generate load for Apache Kafka clusters:

The Anomalia Machina is going to require (at least!) one more thing as stated in the intro, loading with lots of data! Kafka is a log aggregation system and operates on a publish-subscribe mechanism. The Kafka cluster in Anomalia Machina will be accumulating a lot of events which are to be processed to discover anomalies. The exact sequence of processing is still being prototyped at this point in time, but there is a solid requirement of a tool/mechanism to load the Kafka cluster with lots of data in a hurry.

The requirements pointed me in direction of looking for ‘Kafka Load Testing’. Firstly thinking of load testing, one tool comes to mind which is used very widely for load testing of Java based systems: ‘Jmeter’. Jmeter has rich toolset to perform various types of testing. It also comes with many advantages viz. Open source, easy to use, platform independent, distributed testing etc. I can use Jmeter and test its ability to perform cluster loading.

Read on for the demonstration.

Comments closed

Testing TDE Performance

Eduardo Pivaral tests the performance of a database with Transparent Data Encryption versus that same database without encryption:

Transparent data encryption (TDE) helps you to secure your data at rest, this means the data files and related backups are encrypted, securing your data in case your media is stolen.
This technology works by implementing real-time I/O encryption and decryption, so this implementation is transparent for your applications and users.

However, this type of implementation could lead to some performance degradation since more resources must be allocated in order to perform the encrypt/decrypt operations.

On this post we will compare how much longer take some of the most common DB operations, so in case you are planning to implement it on your database, you can have an idea on what to expect from different operations.

These results fit in reasonably well with what I’d heard, but it’s nice to have someone run the numbers.

Comments closed

Building Observable Distributed Systems

Kevin Sookocheff has some thoughts on building observable systems:

Given the shortcomings of monitoring and testing, we should shift focus to building observable systems. This means treating observability of system behaviour as a primary feature of the system being built, and integrating this feature into how we design, build, test, and maintain our systems. This also means acknowledging that the ease with which we can debug our production environment will be a key indicator of system reliability, scalability, and ultimately customer experience. Designing a system to be observable requires effort from three disciplines of software development: development, testing, and operations. None of these disciplines is more important than the others, and the sum of them is greater than the value of the individual parts. Let’s take some time to look at each discipline in more detail, with a focus on observability.

My struggle has never been with the concept, but rather with getting the implementation details right.  “Make everything observable” is great until you run out of disk space because you’re logging everything.

Comments closed