2019-01-03 – Curated SQL

Dynamic Programming In R With RCppDynProg

Published 2019-01-03 by Kevin Feasel

John Mount has a new package available in R:

In the above we have an input (or independent variable) x and an observed outcome (or dependent variable) y_observed (portrayed as points). y_observed is the unobserved idea value y_ideal (portrayed by the dashed curve) plus independent noise. The modeling goal is to get close the y_ideal curve using the y_observed observations. Obviously this can be done with a smoothing spline, but let’s use RcppDynProg to find a piecewise linear fit.
To encode this as a dynamic programming problem we need to build a cost matrix that for every consecutive interval of x-values we have estimated the out-of sample quality of fit. This is supplied by the function RcppDynProg::lin_costs() (using the PRESS statistic), but lets take a quick look at the idea.

It’s an interesting package whose purpose is to turn an input data stream into a set of linear functions which approximate the stream. I’m not sure I’ll ever have a chance to use it, but it’s good to know that it’s there if I do ever need it.

Comments closed

Query Tuning In CosmosDB

Published 2019-01-03 by Kevin Feasel

Hasan Savran explains how we can tune queries in CosmosDB:

This is most common question in my talks about Cosmos DB from DBAs. Cosmos DB is a managed database, this does not mean that you cannot tune up your queries. But the way you tune up the queries is nothing like SQL Server.

First you need to be sure that you configured your Cosmos DB containers right. What do I mean with that? You should pick the right partition key before you start to tune up any of your queries. Tuning up your queries is not going to help you in long run if you selected a wrong partition key when you created Cosmos DB containers. Throughput value is another value you need to worry about, the good news about the throughput is, you can change it if you need to. You cannot change your partition key!

It’s a whole different world over there.

Comments closed

Query Store Changes

Published 2019-01-03 by Kevin Feasel

Milos Radivojevic shows us the Query Store default values and how they’ve changed between SQL Server 2017 and SQL Server 2019:

When you look at articles, posts and documents about new features and enhancements in SQL Server 2019 CTP2, you will find nothing about Query Store. However, there are some graphical enhancements in SQL Server Management Studio in the version 18.0, also default configuration for Query Store attributes is changed too.
First SSMS 18.0. From this version, you can see another Query Store report – Query Wait Statistics. When you click on it, you can see aggregate waits per category in a given time interval (default is last hour).

It looks like there have been some incremental improvements to Query Store. I think the defaults also make a bit more sense.

Comments closed

Building Test Data Following A Normal Distribution In T-SQL

Published 2019-01-03 by Kevin Feasel

I (finally) have a technical blog post:

In order to show you the solution, I want to build up a reasonable sized sample. Any solution looks great when reading five records, but let’s kick that up a notch. Or, more specifically, a million notches: I’m going to use a CTE tally table and load 5 million rows.
I want some realistic looking data, so I’ve adapted Dallas Snider’s strategy to build a data set which approximates a normal distribution.
Because this is a little complicated, I wanted to take the time and explain the data load process in detail in its own post, and then apply it in the follow-up post. We’ll start with a relatively small number of records for this demonstration: 50,000. The reason is that you can generate 50K records almost instantaneously but once you start getting a couple orders of magnitude larger, things slow down some.

If you do custom data generation for lower environments, I’d recommend checking this out. Your production data probably doesn’t follow a normal distribution exactly, but a normal distribution is probably closer to reality than the uniform distribution you get with functions like RAND().

Comments closed

Using AzCopy To Sync Data

Published 2019-01-03 by Kevin Feasel

Randolph West is pleased with an update to AzCopy:

As of November 2018 however, the v10 preview of AzCopy is vastly improved. Firstly, it runs cross-platform on Windows, Linux and macOS (it is open-source, and appears to be written in Go). Secondly, it has more sensible command-line switches. Thirdly, and most importantly in my mind, it includes the much-awaited sync option. This has been a long time coming.

Click through for a demonstration of this synchronization option.

Comments closed

Performance Impact With Nullable Columns

Published 2019-01-03 by Kevin Feasel

Daniel Hutmacher shows us situations in which nullable columns can lead to worse performance:

We expected the Semi Join to turn into an Anti Semi Join, but the plan now also contains a Nested Loop branch with a Row Count Spool – what’s that about? Turns out the Row Count Spool, along with its index seek, has to do with the NOT IN() and the fact that we’re looking at a nullable column. Remember that…
x NOT IN (y, z, NULL)
… always returns false, because the NULL value could represent essentially anything, including the x. And so it is with the inner table, if there happens to be a NULL value among those rows.

Definitely worth the read. There are ways around this performance hit but design decisions have consequences.

Comments closed

Data Compression With dbatools

Published 2019-01-03 by Kevin Feasel

Jess Pomfret shows us the data compression options available when using dbatools:

Data compression is not a new feature in SQL Server. In fact it has been around since SQL Server 2008, so why does it matter now? Before SQL Server2016 SP1 this feature was only available in Enterprise edition. Now that it’s in Standard edition data compression can be an option for far more people.
dbatools has three functions available to help you work with data compression, and in true dbatools style it makes it easy and fast to compress your databases.

Read on for a breakdown of the Powershell cmdlets available.

Comments closed

Navigating Execution Plans In Management Studio

Published 2019-01-03 by Kevin Feasel

Greg Low shares a few tips on working with graphical execution plans in SQL Server Management Studio:

So I can zoom in and out, set a custom zoom level, or zoom until the entire plan fits. Generally though, that would make the plan too small to read, as soon as you have a complicated plan.
But in one of the least discoverable UI features in SSMS, there is an option to pan around the plan.

Click through for the demos. My favorite way of navigating graphical execution plans in SSMS is to use SentryOne Plan Explorer instead.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Day: January 3, 2019