Press "Enter" to skip to content

Author: Kevin Feasel

Testing TPC-H with Batch Mode

Niko Neugebauer looks at TPC-H query testing in both row mode and batch mode:

I executed every single query enough times so that the execution would be run totally In-Memory (64GB of RAM is enough because we have our data compressed, as I mentioned earlier in the SETUP part). This would allow me to mimic a busy system that has enough resources to process the reading workload. Since the Batch Execution Mode is focusing on the CPU improvements, I decided to sample not only the total elapsed time, but the CPU times so that we can do some judgements of the CPU bandwidth variation. Lowering the CPU consumption fo the high-demanding CPU queries is a key in order to improve the overall system parallelism (well, watch-out for the memory, of-course).

Each successful execution was sampled at least 3 (and in some query cases over 5 times) and then the result would be averaged, so that we can have a higher confidence.

Niko has some interesting findings, some good for SQL Server and some not so good.

Comments closed

Finding Parameter Aliases in Powershell

Mike Robbins has a quick script to find aliases for parameters in Powershell cmdlets:

While sitting through Jeff Hicks‘ Advanced PowerShell Scripting Workshop at PowerShell on the River in Chattanooga today, he mentioned there being a “Cn” alias for the ComputerName parameter of commands in PowerShell.

I’ve previously written a one-liner to find parameter aliases and at one time Microsoft had starting adding parameter aliases to the help for commands as referenced in that same blog article, but it appears that they’ve discontinued adding them to the help and removed the ones they previously added to it.

To be honest, I don’t like aliases that much. Between tab completion and nearest match (where I can type “C” and get “ComputerName” because it’s the only parameter which starts with a C), that satisfies one-off scripts. And for reusable scripts, I’d want to use full names so that the next person doesn’t hunt me down because all of my parameters look like “C” or “Dep”.

Comments closed

Memory-Optimized Table Types

Rob Farley hates spelling “optimized” the best way:

Let me start by saying that if you really want to get the most out of this feature, you will dive deep into questions like durability and natively-compiled stored procedures, which can really make your database fly if the conditions are right. Arguably, any process you’re doing (such as ETL) where the data doesn’t have to survive a system restart should be considered for Memory-Optimized Tables with durability set to SCHEMA_ONLY (I say ‘considered’ because the answer isn’t always obvious – at the moment inserting into memory-optimised tables won’t run in parallel, and this could be a show-stopper for you).

But today I’m going to mention one of the quick-wins available: Table Variables that use User-defined Table Types

This can absolutely help you out, especially in versions of SQL Server prior to 2019 where temporary object metadata contention is a real issue on busy servers.

Comments closed

Proving ETL Correctness

Ed Elliott shares a few techniques for testing ETL processes:

Reconciliation is the process of going to your source system, getting a number and validating that number on the target. This ranges from being easy to impossible, so you need to decide what to reconcile on a case by case basis.

In its simplest form, we can go to a source system and find out things like how many records are to be copied, sum up totals and run other aggregations that we can then validate as correct (or not!) on the target system.

Ed has put together a thoughtful approach to validating data loads regardless of the source.

Comments closed

Apache Kafka Tutorials

Michael Drogalis announces Tutorials for Apache Kafka:

For beginners, Kafka Tutorials reveals the “shape” of the problems that event streaming can solve. It makes it easier to recognize the domain of things that you might use event streaming for. Moreover, each tutorial reliably takes you from zero to working code by following each of the steps.

For the experienced, it’s a crucial reference guide that makes your work easier. Easily look up how to join a stream and a table together when you’re rusty, or quickly recall how to merge discrete streams together. Over time, we’ll introduce more advanced material that makes use of the entire stack.

Check it out, including a discussion of their YAML renderer.

Comments closed

Contrasting Logistic Regression and Decision Trees

Shital Katkar explains cases when you might use logistic regression or decision trees for classification problems:

Categorical data works well with Decision Trees, while continuous data work well with Logistic Regression.

If your data is categorical, then Logistic Regression cannot handle pure categorical data (string format). Rather, you need to convert it into numerical data.

Each algorithm has its own uses and assumptions.

Comments closed

Designing for Red-Green Color-Blindness

Andy Kirk has a few tips to help you design for people who have deuteronopia or protanopia:

Many visualisations use colours to represent data values, either to show quantitative scales or categorical classifications. One of the most common colour metaphors used in visual displays involves the use of a red-green colour scheme, sometimes known as “RAG” or “traffic light” colours. These colours are used to convey notions of green = ‘good’ or ‘above average’ and red = ‘bad’ or ‘below average’ in some cultures, and the reverse in others. Such colour connotations are long-established and widely used, especially in financial or corporate contexts, but whilst they provide a certain immediacy in their meaning for many viewers, around 4.5% of the population are colour-blind (8% of men) with the red-green colour deficiency “Deuteranopia” being the most common form. This means a significant proportion of viewers may not be able to perceive important such visual encodings.

I’m not the biggest fan of some of them, but there are some really good ideas in here.

Comments closed

Spaces in CHAR Columns

John McCormack wants to store a single space in a CHAR(1) column:

I was asked by a colleague why his where clause wasn’t being selective when filtering on a space value. The column was a char(1) data type. To understand the curious case of the space in char(1), we need to understand how the char data type works and also a bit more about the need for it in this scenario.

The ANSI standard makes sense, but it is something you have to keep in mind in cases like this.

Comments closed

Parameter Sniffing and Multiple Indexes

Erik Darling looks at how available indexes may contribute to parameter sniffing problems:

When you’re troubleshooting parameter sniffing, the plans might not be totally different.

Sometimes a subtle change of index usage can really throw gas on things.

It’s also a good example of how Key Lookups aren’t always a huge problem.

Both plans had them, just in different places.

The plans had a small change, but that made a big difference.

Comments closed