Month: June 2020

Fun with Query Tuning in SQL Server 2019

Published 2020-06-22 by Kevin Feasel

Erik Darling has just wrapped up a nice series on tackling a problem which looks like parameter sniffing but isn’t. Part 2 covers the issue:

This isn’t always the exact case, but generally speaking you’ll observe something along these lines.
It’s definitely not the case for what we’re going to be looking at this week.
This week is far more interesting.
That’s why it’s a monstrosity.

Part three digs in:

It’s not parameter sniffing, but it sure could feel like it.
– When the procedure compiles and runs with VoteTypeId 5, it runs for 12 minutes
– Other VoteTypeIds run well with the same plan that VoteTypeId 5 gets
– When VoteTypeId 5 runs with a “small” plan, it does okay at 10 seconds

Part four gives us a solution without using OPTIMIZE FOR MEDIOCRE:

This is what happens when we optimize for unknown. The density vector guess is 13,049,400.
That guess for Vote Types with very few rows ends up with a plan that has very high startup costs.
This version of the query will run for 13-17 seconds for any given parameter. That sucks in zero gravity.

Part 5 looks into something which occasionally pops up with this query:

You see, there’s a mystery plan.
It only shows up once in a while, like Planet X. And when it does, we get bombarded by asteroids.
Just like when Planet X shows up.
I wouldn’t call it a good all-around plan, but it does something that we would want to happen when we run this proc for VoteTypeId 5.

Read on for an educational romp through the SQL Server 2019 optimizer.

Comments closed

Database Restoration in Docker

Published 2020-06-22 by Kevin Feasel

John Morehouse gives us one way to restore a database in Docker:

Here are the steps that we will take to make this work:
1. Download one of the sample databases from I have a “mssql” directory in my local profile to make things easier
2. Make sure the container is started. You can issue a “docker ps” command terminal to see which containers are running
3. Create a directory within the container
4. Copy the sample database backup file into the directory you just created
5. Restore the database onto the SQL instance that is running within the container

The set of steps is fine and it’s what I normally do, though someone did suggest to set up an external volume linking, e.g., /var/opt/mssql/backups outside the container. That way, you can drop your backup file in and it’ll be there without the copy step.

Comments closed

I Remember Halloween

Published 2020-06-22 by Kevin Feasel

Jared Poche experiences Halloween problem protection:

Simple enough so far. The scan is against a memory optimized table variable, and the filter to the left our our seeks and scans check for a change to our value. Nothing left but to update the index and…
CURVE BALL
Wait, what’s all this? We have a Split operator after our Clustered Index Update. SQL Server does sometime turn an UPDATE statement into effectively a DELETE and INSERT if the row needs to move, but this seems a bit much. We have a total of 4 index update/delete operators now, and they aren’t cheap.
My very simple addition to the WHERE clause actually caused a small increase in duration, and a big jump in CPU. So what’s going on?

Read on to see the cause and what Jared was able to do about it.

Comments closed

Expanding INT to BIGINT Without (Much) Downtime

Published 2020-06-22 by Kevin Feasel

Michael J. Swart finds a creative way around a problem, and then a creative way around the problem it causes, and so on:

The table has over 2 billion rows and it looks like it’s going to run out of space soon because the LogId column is defined as an INT. I need to change this table so that it’s a BIGINT. But changing an INT to a BIGINT is known as a “size of data” operation. This means SQL Server has to process every row to expand the LogId column from 4 to 8 bytes. But it gets trickier than that.
The biggest challenge is that the table has to remain “online” (available for queries and inserts).
Compression?
Gianluca Sartori (spaghettidba) had the idea of enlarging the columns with no downtime using compression. It’s promising, but I discovered that for this to work, all indexes need to be compressed not just the ones that contain the changed column. Also, any indexes which use the column need to be disabled for this to work.
Cheating
I gave up on solving this problem in general and constrained my focus to the specific problem I was facing. There’s always some context that lets us bend the rules. In my case, here’s what I did.

Read the whole thing. Also, as soon as I saw Michael’s title, I immediately thought of this:

Comments closed

Building Data Pipelines with Apache NiFi

Published 2020-06-19 by Kevin Feasel

The Hadoop in Real World team takes a look at Apache NiFi:

NiFi is an easy to use tool which prefers configuration over coding.
However, NiFi is not limited to data ingestion only. NiFi can also perform data provenance, data cleaning, schema evolution, data aggregation, transformation, scheduling jobs and many others. We will discuss these in more detail in some other blog very soon with a real world data flow pipeline.
Hence, we can say NiFi is a highly automated framework used for gathering, transporting, maintaining and aggregating data of various types from various sources to destination in a data flow pipeline.

Click through for an example with instructions. The feeling is pretty close to Informatica or SQL Server Integration Services, so if you’re an old hand at one of those, you’ll get into this pretty easily.

Comments closed

Data Visualization in R

Published 2020-06-19 by Kevin Feasel

Dan Fitton provides an introductory overview to several visualization tools in R:

The other way to communicate data with R is to produce an interactive dashboard or web application within R using Shiny. Whereas Markdown reports are most useful for explanatory analysis; Shiny, in my opinion, is useful for exploratory data analysis. This is when you want to display information for investigative purposes, allowing the user to gain greater familiarity by having the ability to interact with data, filter it, and dig deeper into the underlying details.
Shiny is incredibly flexible, providing the user the capability of turning their R code and objects, including tables, plots, and analysis, into a comprehensive and interactive web page or app, without requiring a fully-fledged web development skillset. Although there is a steep learning curve, the freedom and precision Shiny brings means that for the most part you are limited only by your skillset rather than the tool itself.

I’ve seen some really useful Shiny dashboards. Dan is right that there can be a lot of work put into getting them right, but if you do, the results can be outstanding.

Comments closed

Breaking out of Azure Data Factory ForEach Activities

Published 2020-06-19 by Kevin Feasel

Andy Leonard is planning a jailbreak:

“What if something fails inside the ForEach activity’s inner activities, Andy?”
That is an excellent question! I’m glad you asked. The answer is: The ForEach activity continues iterating.
I can now hear some of you asking…
“What if I want the ForEach activity to fail when an inner activity fails, Andy?”
Another excellent question, and you’ve found a post discussing one way to “break out” of a ForEach activity’s iteration.

Read on for the process.

Comments closed

NVARCHAR Everywhere

Published 2020-06-19 by Kevin Feasel

I get to put on my contrarian hat:

In the last episode of Shop Talk, I laid out an opinion which was…not well received. So I wanted to take some time and walk through my thinking a little more cogently than I was able to do during Shop Talk.
Here’s the short version. When you create a table and need a string column, you have a couple options available: VARCHAR and NVARCHAR. Let’s say that you’re a developer creating a table to store this string data. Do you choose VARCHAR or NVARCHAR? The classic answer is, “It depends.” And so I talk about why that is in video format right below these words.

I have a video which goes into detail, plus a bunch of words. Plus mice and banjos. 🐭🪕

Comments closed

The Origins of SentryOne Plan Explorer

Published 2020-06-19 by Kevin Feasel

Jason Hall gives us a bit of history:

Greg saw a need in his own work, and I was seeing a need in the field with our customers, for a way to go beyond identifying high-impact queries. DBAs and developers needed a way to tune queries surfaced by SentryOne SQL Sentry’s Top SQL without fiddling with a lot of extra tools to get there. We were already building integration with SQL Server Management Studio (SSMS), which included graphical query plans, so the original thought was to extend that integration from SentryOne with a link that opened plans in SSMS from Top SQL in SQL Sentry.
It seemed like an elegant solution that would allow us to reuse some code, but it wasn’t long before Brooke Philpott discovered that we wouldn’t be able to get what we needed this way. That particular part of SSMS wasn’t exposed to us in the manner we needed. Par for the course, we weren’t going to let that stop us from filling the need. Greg and Brooke dug into the problem to discover a mix of documentation, flow controls, and ingenuity that would provide the foundation for building our own query plan visuals.

Read on for the story and a bit about how the product has morphed through the years.

Comments closed

Securing the Data Prep Area

Published 2020-06-19 by Kevin Feasel

Tim Mitchell explains why you should limit access to your staging area:

First things first, let’s define what a data prep area is. Data preparation (prep) is a common phase of extract, transform, and load (ETL) operations in which data is temporarily written for cleansing, deduplication, reshaping, or other data modifications. Also sometimes referred to as a landing area or a staging area, this is a common design pattern when moving data from a data store optimized for online transaction processing (OLTP) to a data model more friendly to analytics or reporting.
The data prep area really is a lot like a restaurant kitchen: it’s sometimes chaotic, it’s not consumer friendly, and there is a legitimate risk of consuming half-prepared goods.

Tim lays out why that is, so check it out.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30