Press "Enter" to skip to content

Author: Kevin Feasel

Building Test Data Following A Normal Distribution In T-SQL

I (finally) have a technical blog post:

In order to show you the solution, I want to build up a reasonable sized sample.  Any solution looks great when reading five records, but let’s kick that up a notch.  Or, more specifically, a million notches:  I’m going to use a CTE tally table and load 5 million rows.
I want some realistic looking data, so I’ve adapted Dallas Snider’s strategy to build a data set which approximates a normal distribution.
Because this is a little complicated, I wanted to take the time and explain the data load process in detail in its own post, and then apply it in the follow-up post.  We’ll start with a relatively small number of records for this demonstration:  50,000.  The reason is that you can generate 50K records almost instantaneously but once you start getting a couple orders of magnitude larger, things slow down some.

If you do custom data generation for lower environments, I’d recommend checking this out. Your production data probably doesn’t follow a normal distribution exactly, but a normal distribution is probably closer to reality than the uniform distribution you get with functions like RAND().

Comments closed

Using AzCopy To Sync Data

Randolph West is pleased with an update to AzCopy:

As of November 2018 however, the v10 preview of AzCopy is vastly improved. Firstly, it runs cross-platform on Windows, Linux and macOS (it is open-source, and appears to be written in Go). Secondly, it has more sensible command-line switches. Thirdly, and most importantly in my mind, it includes the much-awaited sync option. This has been a long time coming.

Click through for a demonstration of this synchronization option.

Comments closed

Performance Impact With Nullable Columns

Daniel Hutmacher shows us situations in which nullable columns can lead to worse performance:

We expected the Semi Join to turn into an Anti Semi Join, but the plan now also contains a Nested Loop branch with a Row Count Spool – what’s that about? Turns out the Row Count Spool, along with its index seek, has to do with the NOT IN() and the fact that we’re looking at a nullable column. Remember that…
x NOT IN (y, z, NULL)
… always returns false, because the NULL value could represent essentially anything, including the x. And so it is with the inner table, if there happens to be a NULL value among those rows.

Definitely worth the read. There are ways around this performance hit but design decisions have consequences.

Comments closed

Data Compression With dbatools

Jess Pomfret shows us the data compression options available when using dbatools:

Data compression is not a new feature in SQL Server. In fact it has been around since SQL Server 2008, so why does it matter now? Before SQL Server2016 SP1 this feature was only available in Enterprise edition. Now that it’s in Standard edition data compression can be an option for far more people.
dbatools has three functions available to help you work with data compression, and in true dbatools style it makes it easy and fast to compress your databases.

Read on for a breakdown of the Powershell cmdlets available.

Comments closed

Navigating Execution Plans In Management Studio

Greg Low shares a few tips on working with graphical execution plans in SQL Server Management Studio:

So I can zoom in and out, set a custom zoom level, or zoom until the entire plan fits. Generally though, that would make the plan too small to read, as soon as you have a complicated plan.
But in one of the least discoverable UI features in SSMS, there is an option to pan around the plan.

Click through for the demos. My favorite way of navigating graphical execution plans in SSMS is to use SentryOne Plan Explorer instead.

Comments closed

Kafka And Exactly-Once Delivery

Rahul Agarwal explains what “exactly-once” means in terms of message-passing systems:

Until recently most organizations have been struggling to achieve the holy grail of message delivery, the exactly-once delivery semantic. Although this has been an out-of-the-box feature since Apache Kafkas 0.11, people are still slow in picking up this feature. Let’s take a moment in understanding exactly-once semantics. What is the big deal about it and how does Kafka solve the problem?
Apache Kafka offers following delivery guarantees. Let’s understand what this really means:

In a distributed system, having true exactly-once processing is extremely difficult to achieve.

Comments closed

Choosing Azure Data Lake Analytics Versus Azure Databricks

Ginger Grant helps us make the decision between using Azure Data Lake Analytics and Azure Databricks:

Databricks is a recent addition to Azure that is greatly influencing the technology choices that people are making when determining how to process data.  Prior to the introduction of Databricks to Azure in March of 2018, if you had a lot of unstructured data which was stored in HDFS clusters, and wanted to analyze it in a scalable fashion, the choice was Data Lake and using USQL with Data Lake Analytics.  With the introduction of Databricks, there is now a choice for analysis between Data Lake Analytics and Databricks for analyzing data.

Click through for the comparison.

Comments closed

Miscellany On Java In SQL Server

Niels Berglund continues a series on SQL Server 2019 and Java support:

This post is the fourth post where I look at the Java extension in SQL Server, i.e. the ability to execute Java code from inside SQL Server. The previous three posts are:
SQL Server 2019 Extensibility Framework & Java – Hello World: We looked at installing and enabling the Java extension, as well as some very basic Java code.
SQL Server 2019 Extensibility Framework & Java – Passing Data: In this post, we discussed what is required to pass data back and forth between SQL Server and Java.
SQL Server 2019 Extensibility Framework & Java – Null Values: This, the Null Values, post is a follow up to the Passing Data post, and we look at how to handle null values in data passed to Java.
This fourth post acts as a “roundup” of miscellaneous “stuff” I did not cover in the three previous posts

If you haven’t seen the first three, check them out too.

Comments closed

The State Of Database Scoped Configurations

Niko Neugebauer takes us through the current state of Database Scoped Configurations in SQL Server:

I have already blogged about the first version of the Database Scoped Configurations for SQL Server 2016, with 4 visible optionsplus the procedure cache cleaning option, but we have followed in SQL Server 2017 with 5 (listed) & 9 (in practice – DISABLE_INTERLEAVED_EXECUTION_TVF, DISABLE_BATCH_MODE_ADAPTIVE_JOINS, BATCH_MODE_MEMORY_GRANT_FEEDBACK, BATCH_MODE_ADAPTIVE_JOINS are visible and functioning), and in just another year we have received a huge upgrade to the currently available 21 for SQL Server 2019.

It seems like this is a common route the SQL Server teams are going down, and it makes sense: your settings for Mega-DB probably shouldn’t be the same as for the tiny database in the corner. Oh, and that whole Azure SQL Database thing.

Comments closed