Curated SQL – Page 1006 – A Fine Slice Of SQL Server

The If Statement and Friends in Powershell

Published 2019-08-13 by Kevin Feasel

Kevin Marquette explains what is possible with if in Powershell:

The -not operator flips an expression from $false to $true or from $true to $false. Here is an example where we want to perform an action when Test-Path is $false.
if ( -not ( Test-Path -Path $path ) )

There’s plenty of good stuff here, so check it out.

Comments closed

Memory-Optimized Table Types

Published 2019-08-13 by Kevin Feasel

Rob Farley hates spelling “optimized” the best way:

Let me start by saying that if you really want to get the most out of this feature, you will dive deep into questions like durability and natively-compiled stored procedures, which can really make your database fly if the conditions are right. Arguably, any process you’re doing (such as ETL) where the data doesn’t have to survive a system restart should be considered for Memory-Optimized Tables with durability set to SCHEMA_ONLY (I say ‘considered’ because the answer isn’t always obvious – at the moment inserting into memory-optimised tables won’t run in parallel, and this could be a show-stopper for you).
But today I’m going to mention one of the quick-wins available: Table Variables that use User-defined Table Types

This can absolutely help you out, especially in versions of SQL Server prior to 2019 where temporary object metadata contention is a real issue on busy servers.

Comments closed

Proving ETL Correctness

Published 2019-08-13 by Kevin Feasel

Ed Elliott shares a few techniques for testing ETL processes:

Reconciliation is the process of going to your source system, getting a number and validating that number on the target. This ranges from being easy to impossible, so you need to decide what to reconcile on a case by case basis.
In its simplest form, we can go to a source system and find out things like how many records are to be copied, sum up totals and run other aggregations that we can then validate as correct (or not!) on the target system.

Ed has put together a thoughtful approach to validating data loads regardless of the source.

Comments closed

Apache Kafka Tutorials

Published 2019-08-12 by Kevin Feasel

Michael Drogalis announces Tutorials for Apache Kafka:

For beginners, Kafka Tutorials reveals the “shape” of the problems that event streaming can solve. It makes it easier to recognize the domain of things that you might use event streaming for. Moreover, each tutorial reliably takes you from zero to working code by following each of the steps.
For the experienced, it’s a crucial reference guide that makes your work easier. Easily look up how to join a stream and a table together when you’re rusty, or quickly recall how to merge discrete streams together. Over time, we’ll introduce more advanced material that makes use of the entire stack.

Check it out, including a discussion of their YAML renderer.

Comments closed

Contrasting Logistic Regression and Decision Trees

Published 2019-08-12 by Kevin Feasel

Shital Katkar explains cases when you might use logistic regression or decision trees for classification problems:

Categorical data works well with Decision Trees, while continuous data work well with Logistic Regression.
If your data is categorical, then Logistic Regression cannot handle pure categorical data (string format). Rather, you need to convert it into numerical data.

Each algorithm has its own uses and assumptions.

Comments closed

Designing for Red-Green Color-Blindness

Published 2019-08-12 by Kevin Feasel

Andy Kirk has a few tips to help you design for people who have deuteronopia or protanopia:

Many visualisations use colours to represent data values, either to show quantitative scales or categorical classifications. One of the most common colour metaphors used in visual displays involves the use of a red-green colour scheme, sometimes known as “RAG” or “traffic light” colours. These colours are used to convey notions of green = ‘good’ or ‘above average’ and red = ‘bad’ or ‘below average’ in some cultures, and the reverse in others. Such colour connotations are long-established and widely used, especially in financial or corporate contexts, but whilst they provide a certain immediacy in their meaning for many viewers, around 4.5% of the population are colour-blind (8% of men) with the red-green colour deficiency “Deuteranopia” being the most common form. This means a significant proportion of viewers may not be able to perceive important such visual encodings.

I’m not the biggest fan of some of them, but there are some really good ideas in here.

Comments closed

Spaces in CHAR Columns

Published 2019-08-12 by Kevin Feasel

John McCormack wants to store a single space in a CHAR(1) column:

I was asked by a colleague why his where clause wasn’t being selective when filtering on a space value. The column was a char(1) data type. To understand the curious case of the space in char(1), we need to understand how the char data type works and also a bit more about the need for it in this scenario.

The ANSI standard makes sense, but it is something you have to keep in mind in cases like this.

Comments closed

Parameter Sniffing and Multiple Indexes

Published 2019-08-12 by Kevin Feasel

Erik Darling looks at how available indexes may contribute to parameter sniffing problems:

When you’re troubleshooting parameter sniffing, the plans might not be totally different.
Sometimes a subtle change of index usage can really throw gas on things.
It’s also a good example of how Key Lookups aren’t always a huge problem.
Both plans had them, just in different places.

The plans had a small change, but that made a big difference.

Comments closed

Tracking Transactional Replication Status

Published 2019-08-12 by Kevin Feasel

Pamela Mooney has a script to validate that transactional replication is up to date:

You may sometimes have reports or other processes that are dependent on transactional replication being current. If that is the case, you will probably need a mechanism to check and see if, in fact, replication is caught up. Here is my solution to that, without having to resort to Replication Monitor all the time. The bonus? This could be inserted into conditional workflows to help streamline processes (i.e., validate publications before moving on to Step 2 of process).
To do this, I chose to make three stored procedures. The first one to just check all publications on a server, one to check just one publication on a server, and one central sproc to rule them all. You simply execute the master stored procedure, and based on the parameters you feed, it decides which of the other two to execute.

Read on for those scripts.

Comments closed

Optimizing Max Value Performance in Power Query

Published 2019-08-12 by Kevin Feasel

Chris Webb shows us how to speed up a query to get the maximum value in a column:

In part 1 of this series – which I strongly recommend you read before reading this post – I showed how removing columns from a table can make a dramatic improvement to the performance of certain transformations in Power Query. In this post I’ll show some tricks taught to me by Curt Hagenlocher of the dev team that can improve performance even more.

Click through for the trick and an explanation of when it works and when it doesn’t.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts