Unit Testing Spark Streaming DStreams

Anuj Saxena shows how to create unit tests for DStreams in Spark Streaming:

The method ‘ testOperation ‘ takes the output of the operation performed on the ‘inputPair’ and check whether it is equal to the ‘outputPair’ and just like this, we can test our business logic.

This short snippet lets you test your business logic without forcing you to create even a Spark session. You can mock the whole streaming environment and test your business logic easily.

This was a simple example of unary operations on DStreams. Similarly, we can test binary operations and window operations on DStreams.

Click through for an example with code.

Why You’re Pestering

Rob Sewell shows off the Because parameter in Pester 4.2.0:

This release adds the Because parameter to the all assertions. This means that you can add a reason why the test has failed. As JAKUB JAREŠ writes here

  • Reasons force you think more

  • Reasons document your intent

  • Reasons make your TestCases clearer

Click through for examples galore.  4.2.1 should have a -Parent macro which inputs -Because “I  said so”.

Unit Testing Spark Streaming DStreams

Anuj Saxena gives an example of using StreamingSuiteBase to build unit tests for DStreams in Spark Streaming:

So what’s the problem? How to execute streaming logic in a test environment.

We can write Integration test cases and provide the actual environment in the integration test. But for unit testing, we need a testing environment which should not depend on any external application.

Click through for the example.

Policing Database Deployments

Drew Furgiuele has a rather interesting library that he’s released to the general public:

The base return is the TSQLFragment object, which in turn has a Batches object, which in turn holds… well it can hold a lot of different things. When the text is parsed, it will determine what type of object to return based on the statement it determines it is. For example, if it’s an insert statement it will be a certain type of object with a given set of properties and methods, and if it’s, say, a create index statement you’ll get different properties, such as which table or view is getting the index along with the indexed columns and included columns. It really is interesting.

But interesting can a double-edged sword: since the statement object that gets returned can be different for each parsed piece of code, that means to set up any type of intelligence around the stuff we’re dealing with, we need to check for very specific objects.

Unfortunately, I never got past the first animated GIF, whose subtitle was wrong.  You, however, should read the whole thing.

Looping Through Pester Tests

Rob Sewell shows how to iterate through a collection of Pester tests:

The problem with  Test Cases is that we can only easily loop through one collection, but as Pester is just PowerShell we can simply use ForEach if we wanted to loop through multiple ones, like instances and then databases.

I like to use the ForEach method as it is slightly quicker than other methods. It will only work with PowerShell version 4 and above. Below that version you need to pipe the collection to For-EachObject.

Check it out for a good explanation of running groups of Powershell tests.

Don’t Unit Test Private Methods

Vladimir Khorikov argues that you should not unit test private methods:

When your tests start to know too much about the internals of the system under test (SUT), that leads to false positives during refactoring. Which means they won’t act as a safety net anymore and instead will impede your refactoring efforts because of the necessity to refactor them along with the implementation details they are bound to. Basically, they will stop fulfilling their main objective: providing you with the confidence in code correctness.

When it comes to unit testing, you need to follow this one rule: test only the public API of the SUT, don’t expose its implementation details in order to enable unit testing. Your tests should use the SUT the same way its regular clients do, don’t give them any special privileges. Here you can read more about what an implementation detail is and how it is different from public API: link.

In the database world, this is one reason why I like using stored procedures:  they give the equivalent of a public API for database code, so you can write tests for them.

tSQLt And VS Database Tests

Gavin Campbell combines tSQLt along with a Visual Studio database test project:

There are a few ways of getting the tSQLt objects deployed to where they are needed for testing, the way I use most often is basically this one, whereby we create a .dacpac of just the tSQLt objects (or use one we made earlier!), and create a second database project with a Database Reference of type “Same Database” to the project we are trying to test, and a reference to our tSQLt .dacpac. The .dacpac file needs to be somewhere in our source folders, as it will be added by path. We also need a reference to the master database, as this is required to build the tSQLt project.

I see the two tools as serving completely different purposes:  tSQLt is a decent unit test framework, whereas a Visual Studio database project is a decent integration test framework.  It’s interesting that Gavin was able to combine them here but aside from having a common test runner, my inclination would be to keep them separated.

New E-Mail Course For Unit Testing T-SQL Code

Ed Elliott has a free e-mail course available to learn how to use tSQLt:

Unit testing helps us to write better code, make rapid changes to our code and has been generally seen as a good idea for about 10 years. Writing tests for T-SQL code is made much easier by using tSQLt but there is quite a high barrier to entry both in terms of the technical skills in getting tSQLt running and also how to approach large code bases of, sometimes, unfriendly T-SQL code and taming the code with unit tests.

I have successfully unit tested T-SQL code in a number of different environments including clean greenfield environments as well as legacy projects and I have written this course to help people get started with unit testing but also help them to turn unit testing into a part of their development process that they can use everyday to improve the quality of their work and the speed at which deployments can be made.

Click through to sign up.

TDE + AG = Higher CPU Utilization

Ginger Keys has an analysis stress testing CPU load when Transparent Data Encryption is on and a database is in an Availability Group:

Microsoft says that turning on TDE (Transparent Data Encryption) for a database will result in a 2-4% performance penalty, which is actually not too bad given the benefits of having your data more secure. There is even more of a performance hit when enabling cell level or column level encryption. When encrypting any of your databases, keep in mind that the tempdb database will also be encrypted. This could have a performance impact on your other non-encrypted databases on the same instance.

In a previous post I demonstrated how to add an encrypted database to an AlwaysOn group in SQL2016. In this article I will demonstrate the performance effects of having an encrypted database in your AlwaysOn Group compared to the same database not-encrypted.

The results aren’t surprising, though the magnitude of the results might be.

The Importance Of A Test Environment

Randolph West explains why it’s important to have a test environment separate from your development and production environments:

Some companies I’ve worked with have different forms of testing environments, including QA (Quality Assurance), IAT (Internal Acceptance Testing), and UAT (User Acceptance Testing). What they are called doesn’t matter, so long as they exist.

In a typical IT deployment, whether using Waterfall, Agile, or other development methodologies of the month, it pays to have a basic developmenttestproduction deployment path.

Randolph explains it in some detail but one of the big benefits for me is that you can make sure that deployment process works before deployment time.  Knowing that your checked-in scripts won’t break the deployment (because they didn’t break the CI build and release) makes the release process a lot less stressful.

Categories

February 2018
MTWTFSS
« Jan  
 1234
567891011
12131415161718
19202122232425
262728