Category: Testing

Benchmarking Databricks vs Snowflake

Published 2021-11-18 by Kevin Feasel

Mostafa Mokhtar, et al, respond to some benchmarking claims:

On Nov 2, 2021, we announced that we set the official world record for the fastest data warehouse with our Databricks SQL lakehouse platform. These results were audited and reported by the official Transaction Processing Performance Council (TPC) in a 37-page document available online at tpc.org. We also shared a third-party benchmark by the Barcelona Supercomputing Center (BSC) outlining that Databricks SQL is significantly faster and more cost effective than Snowflake.
A lot has happened since then: many congratulations, some questions, and some sour grapes. We take this opportunity to reiterate that we stand by our blog post and the results: Databricks SQL provides superior performance and price performance over Snowflake, even on data warehousing workloads (TPC-DS).

Posts like this are exactly why getting rid of the DeWitt clause is important. I’d rather have Snowflake and Databricks duking it out with publicly-available and testable processes. When reading this, the most important part of this post was the several exhortations to try it out yourself, both for the Databricks test and the Snowflake test. Make benchmarking public, including hardware choices, configuration choices, and the testing process; then, I can tell for sure if your benchmark makes sense for my use case.

1 Comment

Testing sp_ineachdb

Published 2021-10-12 by Kevin Feasel

Aaron Bertrand takes us to the Island of Misfit Databases:

The only database that requires extra handling is the one that contains a tab, because SQL Server doesn’t know how to generate file names when that character is present. I am sure there are a bunch of other less common but equally exotic characters that may cause the same problem.
This is how I actually tested sp_ineachdb, to make sure it was ready for just about any bad idea anyone used to name a database, and could handle various possible database states (for a lot more background on this procedure, and why it is better than the undocumented, unsupported, and buggy sp_msforeachdb, see this and this). Here you can see that the procedure works against all these poorly-named databases, and skips databases that are inaccessible (rather than raise an exception).

Click through to see the list of databases Aaron uses. Technically, I think Aaron’s blog post also counts as a Halloween post because some of those databases are spooky.

Comments closed

Testing If sp_crecompile Updates Views

Published 2021-08-18 by Kevin Feasel

Kenneth Fisher puts on a lab coat and safety goggles:

So, now the question comes in. If I run sp_recompile against a table, will that also cause the associated views to be updated?. So quick experement.

Click through for the experiment. And I completely agree with Kenneth’s sentiment at the end—be willing to try things out. One of the nicest things about working in databases is that we have extremely low-friction ways to experiment—a fully-featured IDE allows us to connect directly to the database, we can create and drop tables or databases at will, and we can even use containers for some of the riskier stuff (assuming that your test also works on Linux).

Comments closed

Azure Test Plan Terminology

Published 2021-07-28 by Kevin Feasel

Kevin Chant is here with a language lesson:

In this post I want to cover some Azure Test Plans jargon for Data Platform professionals. Because I understand it can be confusing.
In addition, I did say I would explain some jargon in my last post about using Azure Test Plans for Data Platform deployments. Of course, these explanations will help with other kinds of deployments as well as Data Platform ones.
By the end of this post, you will have a better understanding of some of the jargon involved in Azure Test Plans. Plus, a good recommendation of a lab to use.

Click through for that depiction.

Comments closed

Comparing Property-Based and Partition-Based Testing

Published 2021-06-29 by Kevin Feasel

Mark Seemann compares and contrasts two types of testing which typically get conflated:

To be fair, the overlap may easily be larger than the figure implies, but you can certainly describes properties without having to partition a function’s domain.
In fact, the canonical example of property-based testing (that reversing a list twice yields the original list: reverse (reverse xs) == xs) does not rely on partitioning. It works for all finite lists.
You may think that this is only because the case is so simple, but that’s not the case. You can also avoid partitioning on the slightly more complex problem presented by the Diamond kata. In fact, the domain for that problem is so small that you don’t need a property-based framework.

This is an interesting look at two related but separate branches of testing.

Comments closed

Grouping Outputs of Pester Tests

Published 2021-05-18 by Kevin Feasel

Shane O’Neill has fun with Pester:

I’ve been working with Pester v5 lately.
Pester v5 with PowerShell v5 at work & Pester v5 with PowerShell Core outside of work.
There are quite a few changes from Pester version 3, so it’s almost like learning a new language… except it’s based on slang. I think that I’m speaking eloquently, and then I’ve suddenly insulted someone and Pester no longer wants to play nice with me.

Read on to see how to make those Pester outputs look a lot nicer.

Comments closed

Load Testing using SqlQueryStress

Published 2021-03-19 by Kevin Feasel

Chad Callihan walks us through the SqlQueryStress program:

Do you have a new SQL server that you need to load test against? What about a new stored procedure that needs tested with various parameters? Maybe you’re just trying to punish your CPU? Whatever the reason, my favorite tool for these scenarios is Adam Machanic’s SqlQueryStress. Before we run through some examples, check out SqlQueryStress on GitHub or get SqlQueryStress from the Microsoft Store.

It’s a pretty simple program which I’ve used for well over a decade. Chad does a good job of walking us through the tool.

Comments closed

Using Pester to Test Cluster Resource Owners

Published 2020-09-30 by Kevin Feasel

Jess Pomfret has a check for who owns specific failover cluster resources:

If we are going to test that we’re in our expected configuration, we need to record what that configuration looks like. I have a hard coded list of cluster names. However, you could easily pull them from a text file, or a database. Once we have the list of clusters we can use Get-ClusterGroup to determine the cluster roles and their current owners.
To persist this owner information I’m using ConvertTo-Json and then outputting it to a file. This creates a file that can easily be read back into PowerShell as an object using ConvertFrom-Json.
It’s also probably worth mentioning that this ideal configuration can be stored in source control. That’ll keep the file safe and you can easily keep track of any changes that are made to it.

Read on for the full set of steps.

Comments closed

Adding Northwind to a SQL Server Instance

Published 2020-09-30 by Kevin Feasel

Doug Kline brings back a blast from the past:

This post shows how to run a SQL Server Instance on about any computer using Docker Containers. Your next step might be to get a sample database into that SQL Server Instance.
Thanks to Microsoft, you can get their sample databases as T-SQL scripts. You can use these to install these databases on whatever server you are connected to, including your “containerized” SQL Server instance.

It’s been a while since I’ve used Northwind, but sometimes you just need a simple database.

Comments closed

Messy Code and Reasonable Expectations

Published 2020-09-04 by Kevin Feasel

Rachel by the Bay has a doozy of a story:

One day not so long ago, I was in a meeting listening to a team explain why their service had gone down and taken out a big chunk of a business. They were one of those things that has to exist and work in order for the actual “thing that makes money” to go. Think of delivering pizzas, connecting dog walkers with dogs who need to be walked, that kind of thing.
It turned out they had been crashing every time a request came through for a certain part of the country. That is, not all pizzas, dog walkers, or whatever it was were handled identically, so they had their own city or region configurations. Think of differences in pricing, taxes, features, or whatever. Trying to process a request for this one particular region had caused the entire process to die when it hit a new config that was “bad” somehow.

Read on for the story. This sounds like a boundary issue. Boundaries are messy and need thorough examination to handle as many possible points of failure as is reasonable. Taking seriously the point that it makes the code messy, the answer is not “Don’t do the checks,” but rather “Put the checks in a place where their messiness has a minimal impact on the rest of my beautiful code but still does the important work we need them to do.” Failing that, live with the mess and have a working process.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31