Press "Enter" to skip to content

Category: Testing

Unit Testing ADX Functions

David Giard builds some tests:

Our application contains many functions that return data stored in Azure Data Explorer (ADX). We wrote these functions in Kusto Query Language (KQL) and each function returns a set of data based on the arguments passed. Although developers tested these functions as they wrote them, we needed a way to validate that the functions continued to work as the code and the data changed.

Automated Unit testing is an essential part of any application development life cycle. It validates that code works properly and minimizes the risk that future code changes will break existing functionality.

In this article, I will discuss the approach we took in automating the testing of ADX functions.

Click through to see how to use the assert() function and build some tests.

Comments closed

Finding Duplicates in Type 2 SCDs

Dinesh Asanka wants to verify some Type 2 slowly changing dimension results:

As we discussed in a previous article, Implementing Slowly Changing Dimensions (SCDs) in Data Warehouses, there are three main types of slowly changing dimensions, such as Type 1, Type 2, and Type 3. Out of these Type 1 is the simple dimension where you will simply maintain only the latest version of the attribute. For example, if the employee got promoted to Senior Software Engineer from Software Engineer, you will simply overwrite the existing value to the new value so that the historical aspect is lost.

Type 2 Slowly Changing Dimensions are used to track historical data in a data warehouse. This is the most common approach in dimension. This article uses a sample database of AdventureworksDW which is the sample database for the data warehouse.

Click through for one way to compare, one which you could build using dynamic SQL.

Comments closed

Stress Testing Power BI Embedded

Kristyna Hughes puts Power BI to the test:

For example, one instance may have a very large data model that takes a lot of memory and CPU time to refresh, 20 users at peak viewing times, hourly refreshes, and the queries are all very simple and allow for query folding. Another business may have six smaller data models, 950 users at peak viewing times, daily refreshes, and the queries populating the data model are all very very complex. All of these elements impact the usage at any given time, making predicting overall CPU needs nearly impossible. Thankfully, stress testing your capacity gives us an option that is not purely reactionary.

This blog will walk through how to stress test your capacity, the elements of capacity planning, and how to understand the results of the stress test.

Read on to see how, using a step-by-step guide.

Comments closed

Time and Unit Tests

Michael J. Swart says, look at the time!:

A flaky test is a unit test that sometimes passes and sometimes fails. The causes of these flaky tests are often elusive because they’re not consistently reproducible.

I’ve found that unit tests that deal with dates and times are notorious for being flaky – especially such tests that talk to SQL Server. I want to explore some of the reasons this can happen.

As a quick note, if you’re using time in database unit tests, don’t use GETUTCDATE() or GETDATE() or any other function like that. It’s a non-deterministic function. Instead, use specific dates and times. That way, you can explicitly test for the types of things Michael points out.

Comments closed

Testability and Functional Code

I describe why the functional approach to writing code makes it testable:

Another important aspect of functional programming relevant to writing testable Python code is that functions should not have side effects.  In other words, functions take inputs and convert them to outputs; they don’t do anything else.  This approach is aspirational rather than entirely realistic—after all, saving to the database is a side effect, and most applications would be fairly boring if they offered absolutely no way to modify the data.  It just happens to be the case that our outlier detection engine can be close to side effect-free because we do not create files, save to a database, or push results to some third-party service.  With most applications, however, we do not tend to be so lucky.

Click through for an excerpt from the draft of an upcoming book as well as a bit of elucidation on key points. The specific language I’m talking about here is Python but the concepts apply to most languages.

Comments closed

Testing Failover Group and TCP Connectivity with Managed Instances

Niko Neugebauer has a pair of connectivity tests for us. First up is failover group connectivity:

When you set up a failover group between primary and secondary SQL Managed Instances in two different regions, each instance is isolated using an independent virtual network. Replication traffic needs to be allowed between these VNets.

To allow this kind of traffic, one of the prerequisites is:

– “You need to set up your Network Security Groups (NSG) such that ports 5022 and the range 11000-11999 are open inbound and outbound for connections from the subnet of the other managed instance. This is to allow replication traffic between the instances.”

Click through for a SQL Agent job script which helps with the test. Meanwhile, you can also test TCP connectivity from a managed instance:

In this post we shall focus on helping you determining the TCP connectivity from SQL Managed Instance against a given endpoint and port of your choice.

If you are interested in other posts on how-to discover different aspects of SQL MI – please visit the  http://aka.ms/sqlmi-howto, which serves as a placeholder for the series.

There are scenarios where it would be nice to be able to test if a SQL Managed Instance can reach some “external” endpoints, like Azure Storage as an example.

Check out both posts.

Comments closed

Discovering Pester Tags

Jeffrey Hicks has a two-parter on discovering Pester tags. Part one is Jeffrey’s take:

As I resolved at the end of last year, I am doing more with Pester in 2022. I’m getting a bit more comfortable with Pester 5 and as my tests grow in complexity I am embracing the use of tags. You can add tags to different Pester test elements. Then when you invoke a Pester test, you can filter and only run specific tests by their tag. As I was working, I realized it would be helpful to be able to identify all of the tags in a test script. After a bit of work, I came up with a PowerShell function.

Part two is a reader’s take:

Yesterday I shared some PowerShell code I wrote to discover tags in a Pester test. It works nicely and I have no reason to complain. But as usual, there is never simply one way to do something in PowerShell. I got a suggestion from @FrodeFlaten on Twitter on an approach using the new configuration object in Pester 5.2. I’ll readily admit that I am still getting up to speed on the latest version of Pester. That’s one of my goals for this year, so this was a great chance to learn something new.

Click through to see how both approaches work.

Comments closed

Creating Boilerplate Pester Assertions

Jeffrey Hicks builds a useful snippet:

During this process, I decided I needed to help myself speed up the test writing phase. I have a standard set of tests that I like to use for functions in my module. But copying and pasting code snippets is tedious. I know I could create a set of VS Code snippets, but that feels limiting and I’d have to make sure the snippets are available on all systems where I might be running VS Code. Instead, I wrote a PowerShell function to accelerate developing Pester 5.x tests.

My function takes a module and extracts all of the public exported functions. For each function, it creates a set of standard Pester assertions. These are the baseline or boilerplate tests that I always want to run for each function. Each function is wrapped in a Describe block. Although, I can opt for a Context block instead. This command will also insert tags. Note that my code for the tag insertion relies on the ternary operator from PowerShell 7.

Click through for the code.

Comments closed

Benchmarking Databricks vs Snowflake

Mostafa Mokhtar, et al, respond to some benchmarking claims:

On Nov 2, 2021, we announced that we set the official world record for the fastest data warehouse with our Databricks SQL lakehouse platform. These results were audited and reported by the official Transaction Processing Performance Council (TPC) in a 37-page document available online at tpc.org. We also shared a third-party benchmark by the Barcelona Supercomputing Center (BSC) outlining that Databricks SQL is significantly faster and more cost effective than Snowflake.

A lot has happened since then: many congratulations, some questions, and some sour grapes. We take this opportunity to reiterate that we stand by our blog post and the results: Databricks SQL provides superior performance and price performance over Snowflake, even on data warehousing workloads (TPC-DS).

Posts like this are exactly why getting rid of the DeWitt clause is important. I’d rather have Snowflake and Databricks duking it out with publicly-available and testable processes. When reading this, the most important part of this post was the several exhortations to try it out yourself, both for the Databricks test and the Snowflake test. Make benchmarking public, including hardware choices, configuration choices, and the testing process; then, I can tell for sure if your benchmark makes sense for my use case.

1 Comment

Testing sp_ineachdb

Aaron Bertrand takes us to the Island of Misfit Databases:

The only database that requires extra handling is the one that contains a tab, because SQL Server doesn’t know how to generate file names when that character is present. I am sure there are a bunch of other less common but equally exotic characters that may cause the same problem.

This is how I actually tested sp_ineachdb, to make sure it was ready for just about any bad idea anyone used to name a database, and could handle various possible database states (for a lot more background on this procedure, and why it is better than the undocumented, unsupported, and buggy sp_msforeachdb, see this and this). Here you can see that the procedure works against all these poorly-named databases, and skips databases that are inaccessible (rather than raise an exception).

Click through to see the list of databases Aaron uses. Technically, I think Aaron’s blog post also counts as a Halloween post because some of those databases are spooky.

Comments closed