Press "Enter" to skip to content

Author: Kevin Feasel

Visualizing Hash Match Joins

Bert Wagner continues his series on visualizing physical join operators:

Hash Match joins are the dependable workhorses of physical join operators.
While Nested Loops joins will fail if the data is too large to fit into memory, and Merge Joins require that the input data are sorted, a Hash Match will join any two data inputs you throw at it (as long as the join has an equality predicate and you have enough space in tempdb).  

Bert has some great animated GIFs too.

Comments closed

Deep Dive On The Adaptive Join Operator

Hugo Kornelis covers the Adaptive Join operator:

The Adaptive Join operator was added in SQL Server 2017 as an alternative to the other join operators: Nested Loops (ideal for joining a small data stream with a cheap input), Hash Match (most effective for joining large unsorted sets) and Merge Join (ideal for joining data streams that are sorted by the join key). It is intended to be used when there is no efficient way to fulfill the order requirement of the Merge Join, and the optimizer cannot reliably predict which of the remaining algorithms (Hash Match or Nested Loops) would perform best.
Because it has to be able to join the data using either the Nested Loops or the Hash Match algorithm, Adaptive Join suffers from the combined restrictions of these operators. As such, Adaptive Join supports only four logical join operations: inner join, left outer join (but not the probed version), left semi join, and left anti semi join;  it requires at least one equality-based join predicate, it uses lots of memory, and it is semi-blocking.

Read on for a detailed look at this operator.

Comments closed

Splatting In Powershell

Chrissy LeMaire explains how you can simplify Powershell cmdlet calls using hashtables:

Splatting in PowerShell makes code easier to read. Instead of typing a bunch of parameters allllll across the screen, you can use an easy-to-read hashtable or array. Argument splatting was introduced in PowerShell v3 and works with all PowerShell commands, not just dbatools.
Note: I’ve only used splatting with hashtables, as they allow me to be explicit about which parameters I’m passing. It appears that arrays would employ positional parameters, which is less wordy but leaves room for error.

Whenever I hear the word “splatting” I think of the Naked Gun series of movies and OJ Simpson getting run over by a steamroller. Those were some funny movies, so that’s a good connotation.

Comments closed

Design Tips For Scaling Systems

Erik Darling has a few ideas for how you can design that SQL Server instance and database for future growth:

I can’t begin to tell you how many terrible things you can avoid by starting your apps out using an optimistic isolation level. Read queries and write queries can magically exist together, at the expense of some tempdb.
Yes, that means you can’t leave transactions open for a very long time, but hey, you shouldn’t do that anyway.
Yes, that means you’ll suffer a bit more if you perform large modifications, but you should be batching them anyway.

Optimistic concurrency is huge—definitely worth the top slot in Erik’s list.

Comments closed

Query Store & Max Storage Size

Arthur Daniels points out that Query Store’s max size is a suggestion rather than a requirement:

But the MAX_STORAGE_SIZE_MB is not a hard limit
I’ve seen this occur in production systems with heavy ad-hoc workloads. As the Query Store gathers data, there seems to be a point where the writes are queued up. When they are written to disk, the max size can be exceeded.
Anecdotally, I’ve seen a production systems where the max size has been vastly exceeded. But let me show you an example on a small scale on my demo machine.

Click through for a demonstration on the small. I’d much rather go over my self-imposed limit than lose useful data, so I’m okay with that decision. If you’re extremely tight on disk space, however, it’s good to know this detail.

Comments closed

Dynamic Programming In R With RCppDynProg

John Mount has a new package available in R:

In the above we have an input (or independent variable) x and an observed outcome (or dependent variable) y_observed (portrayed as points). y_observed is the unobserved idea value y_ideal (portrayed by the dashed curve) plus independent noise. The modeling goal is to get close the y_ideal curve using the y_observed observations. Obviously this can be done with a smoothing spline, but let’s use RcppDynProg to find a piecewise linear fit.
To encode this as a dynamic programming problem we need to build a cost matrix that for every consecutive interval of x-values we have estimated the out-of sample quality of fit. This is supplied by the function RcppDynProg::lin_costs() (using the PRESS statistic), but lets take a quick look at the idea.

It’s an interesting package whose purpose is to turn an input data stream into a set of linear functions which approximate the stream. I’m not sure I’ll ever have a chance to use it, but it’s good to know that it’s there if I do ever need it.

Comments closed

Query Tuning In CosmosDB

Hasan Savran explains how we can tune queries in CosmosDB:

This is most common question in my talks about Cosmos DB from DBAs. Cosmos DB is a managed database, this does not mean that you cannot tune up your queries. But the way you tune up the queries is nothing like SQL Server.

       First you need to be sure that you configured your Cosmos DB containers right. What do I mean with that? You should pick the right partition key before you start to tune up any of your queries. Tuning up your queries is not going to help you in long run if you selected a wrong partition key when you created Cosmos DB containers. Throughput value is another value you need to worry about, the good news about the throughput is, you can change it if you need to. You cannot change your partition key!

It’s a whole different world over there.

Comments closed

Query Store Changes

Milos Radivojevic shows us the Query Store default values and how they’ve changed between SQL Server 2017 and SQL Server 2019:

When you look at articles, posts and documents about new features and enhancements in SQL Server 2019 CTP2, you will find nothing about Query Store. However, there are some  graphical enhancements in SQL Server Management Studio in the version 18.0, also default configuration for Query Store attributes is changed too.
First SSMS 18.0. From this version, you can see another Query Store report – Query Wait Statistics. When you click on it, you can see aggregate waits per category in a given time interval (default is last hour). 

It looks like there have been some incremental improvements to Query Store. I think the defaults also make a bit more sense.

Comments closed

Building Test Data Following A Normal Distribution In T-SQL

I (finally) have a technical blog post:

In order to show you the solution, I want to build up a reasonable sized sample.  Any solution looks great when reading five records, but let’s kick that up a notch.  Or, more specifically, a million notches:  I’m going to use a CTE tally table and load 5 million rows.
I want some realistic looking data, so I’ve adapted Dallas Snider’s strategy to build a data set which approximates a normal distribution.
Because this is a little complicated, I wanted to take the time and explain the data load process in detail in its own post, and then apply it in the follow-up post.  We’ll start with a relatively small number of records for this demonstration:  50,000.  The reason is that you can generate 50K records almost instantaneously but once you start getting a couple orders of magnitude larger, things slow down some.

If you do custom data generation for lower environments, I’d recommend checking this out. Your production data probably doesn’t follow a normal distribution exactly, but a normal distribution is probably closer to reality than the uniform distribution you get with functions like RAND().

Comments closed

Using AzCopy To Sync Data

Randolph West is pleased with an update to AzCopy:

As of November 2018 however, the v10 preview of AzCopy is vastly improved. Firstly, it runs cross-platform on Windows, Linux and macOS (it is open-source, and appears to be written in Go). Secondly, it has more sensible command-line switches. Thirdly, and most importantly in my mind, it includes the much-awaited sync option. This has been a long time coming.

Click through for a demonstration of this synchronization option.

Comments closed