Press "Enter" to skip to content

Month: January 2019

Query Tuning In CosmosDB

Hasan Savran explains how we can tune queries in CosmosDB:

This is most common question in my talks about Cosmos DB from DBAs. Cosmos DB is a managed database, this does not mean that you cannot tune up your queries. But the way you tune up the queries is nothing like SQL Server.

       First you need to be sure that you configured your Cosmos DB containers right. What do I mean with that? You should pick the right partition key before you start to tune up any of your queries. Tuning up your queries is not going to help you in long run if you selected a wrong partition key when you created Cosmos DB containers. Throughput value is another value you need to worry about, the good news about the throughput is, you can change it if you need to. You cannot change your partition key!

It’s a whole different world over there.

Comments closed

Query Store Changes

Milos Radivojevic shows us the Query Store default values and how they’ve changed between SQL Server 2017 and SQL Server 2019:

When you look at articles, posts and documents about new features and enhancements in SQL Server 2019 CTP2, you will find nothing about Query Store. However, there are some  graphical enhancements in SQL Server Management Studio in the version 18.0, also default configuration for Query Store attributes is changed too.
First SSMS 18.0. From this version, you can see another Query Store report – Query Wait Statistics. When you click on it, you can see aggregate waits per category in a given time interval (default is last hour). 

It looks like there have been some incremental improvements to Query Store. I think the defaults also make a bit more sense.

Comments closed

Building Test Data Following A Normal Distribution In T-SQL

I (finally) have a technical blog post:

In order to show you the solution, I want to build up a reasonable sized sample.  Any solution looks great when reading five records, but let’s kick that up a notch.  Or, more specifically, a million notches:  I’m going to use a CTE tally table and load 5 million rows.
I want some realistic looking data, so I’ve adapted Dallas Snider’s strategy to build a data set which approximates a normal distribution.
Because this is a little complicated, I wanted to take the time and explain the data load process in detail in its own post, and then apply it in the follow-up post.  We’ll start with a relatively small number of records for this demonstration:  50,000.  The reason is that you can generate 50K records almost instantaneously but once you start getting a couple orders of magnitude larger, things slow down some.

If you do custom data generation for lower environments, I’d recommend checking this out. Your production data probably doesn’t follow a normal distribution exactly, but a normal distribution is probably closer to reality than the uniform distribution you get with functions like RAND().

Comments closed

Using AzCopy To Sync Data

Randolph West is pleased with an update to AzCopy:

As of November 2018 however, the v10 preview of AzCopy is vastly improved. Firstly, it runs cross-platform on Windows, Linux and macOS (it is open-source, and appears to be written in Go). Secondly, it has more sensible command-line switches. Thirdly, and most importantly in my mind, it includes the much-awaited sync option. This has been a long time coming.

Click through for a demonstration of this synchronization option.

Comments closed

Performance Impact With Nullable Columns

Daniel Hutmacher shows us situations in which nullable columns can lead to worse performance:

We expected the Semi Join to turn into an Anti Semi Join, but the plan now also contains a Nested Loop branch with a Row Count Spool – what’s that about? Turns out the Row Count Spool, along with its index seek, has to do with the NOT IN() and the fact that we’re looking at a nullable column. Remember that…
x NOT IN (y, z, NULL)
… always returns false, because the NULL value could represent essentially anything, including the x. And so it is with the inner table, if there happens to be a NULL value among those rows.

Definitely worth the read. There are ways around this performance hit but design decisions have consequences.

Comments closed

Data Compression With dbatools

Jess Pomfret shows us the data compression options available when using dbatools:

Data compression is not a new feature in SQL Server. In fact it has been around since SQL Server 2008, so why does it matter now? Before SQL Server2016 SP1 this feature was only available in Enterprise edition. Now that it’s in Standard edition data compression can be an option for far more people.
dbatools has three functions available to help you work with data compression, and in true dbatools style it makes it easy and fast to compress your databases.

Read on for a breakdown of the Powershell cmdlets available.

Comments closed

Navigating Execution Plans In Management Studio

Greg Low shares a few tips on working with graphical execution plans in SQL Server Management Studio:

So I can zoom in and out, set a custom zoom level, or zoom until the entire plan fits. Generally though, that would make the plan too small to read, as soon as you have a complicated plan.
But in one of the least discoverable UI features in SSMS, there is an option to pan around the plan.

Click through for the demos. My favorite way of navigating graphical execution plans in SSMS is to use SentryOne Plan Explorer instead.

Comments closed

Kafka And Exactly-Once Delivery

Rahul Agarwal explains what “exactly-once” means in terms of message-passing systems:

Until recently most organizations have been struggling to achieve the holy grail of message delivery, the exactly-once delivery semantic. Although this has been an out-of-the-box feature since Apache Kafkas 0.11, people are still slow in picking up this feature. Let’s take a moment in understanding exactly-once semantics. What is the big deal about it and how does Kafka solve the problem?
Apache Kafka offers following delivery guarantees. Let’s understand what this really means:

In a distributed system, having true exactly-once processing is extremely difficult to achieve.

Comments closed

Choosing Azure Data Lake Analytics Versus Azure Databricks

Ginger Grant helps us make the decision between using Azure Data Lake Analytics and Azure Databricks:

Databricks is a recent addition to Azure that is greatly influencing the technology choices that people are making when determining how to process data.  Prior to the introduction of Databricks to Azure in March of 2018, if you had a lot of unstructured data which was stored in HDFS clusters, and wanted to analyze it in a scalable fashion, the choice was Data Lake and using USQL with Data Lake Analytics.  With the introduction of Databricks, there is now a choice for analysis between Data Lake Analytics and Databricks for analyzing data.

Click through for the comparison.

Comments closed