Press "Enter" to skip to content

Curated SQL Posts

Beware CPU Oversubscription with SQL Server

Monica Rathbun shares a tale of terror:

Recently I had a client complain of chronic high CPU utilization. The performance of their SQL Server had degraded, and it appeared to be related to higher than normal CPU utilization in conjunction with symptoms of unresponsive user queries.  The root cause was twofold—a third party hosting provider had overallocated virtual processors on the physical host where the virtual machine (VM) running SQL Server was residing, as well as a recent upgrade from a version of VMWare that was not patched for Spectre and Meltdown. The host had 16 physical cores and was hyperthreading (making it effectively 32 cores) until the hosting provider patched from VMWare 5.5 to a newer release (we believe 6.5) which was required for Meltdown and Spectre processor vulnerabilities. This patch disabled hyperthreading from the hypervisor to mitigate the security risk from speculative execution. Note, this patch is over a year old and a critical security risk; most software vendors (including VMWare) put this out as an immediate requirement after the announcement of the vulnerabilities.

Given this was a virtual machine, it shared a physical host with many other VMs; this is a very common configuration. However, this host was VERY overallocated.  As mentioned above, there were 16 cores–however 61 additional vCPUs had been allocated to other machines. That’s 4.3 times the number of CPUs available for allocation.  The screenshot below shows this singular Host, highlighting the vCPUs allocated.

So, uh, that’s a bad thing. Monica explains in detail why exactly it’s a bad thing, which is helpful when you’re trying to explain to the server admin why it’s a bad thing. CPU oversubscription can work for things like dev boxes or web servers, where they typically aren’t anywhere near 100% utilization. It does not work at all for busy database servers.

Comments closed

SSIS Catalog Dashboard

Tim Mitchell has a new GitHub repo:

The SSIS Catalog Dashboard is a simple collection of reports that provide insight into the activity within the SSIS catalog. The first of these is the Dashboard report. This report shows a summary of the number of packages that are running or have run in the recent past.

The dashboard repo, a Reporting Services project, is available on GitHub and is licensed under GPL version 3.

Comments closed

Performance Troubleshooting Plus Wait Stats

Jeff Mlakar builds up some thoughts on performance troubleshooting, including wait stats:

Queries go through the cycle of the SPIDS / worker threads waiting in a series like this. A thread uses the resource e.g. CPU until it needs to yield to another that is waiting. It then moves to an unordered list of threads that are SUSPENDED. The next thread on the FIFO queue of threads waiting then becomes RUNNING. If a thread on the SUSPENDED list is notified that its resource is available, it becomes RUNNABLE and goes to the bottom of the queue.

Click through for an analogy using a microwave and plenty more.

Comments closed

Simplify Visuals: No Unnecessary Lines

Stephanie Evergreen shows how you can improve your visuals by removing most of the lines:

The Lines section of the Data Visualization Checklist helps us enhance reader interpretability by handling a lot of the junk, or what Edward Tufte called the “noise” in the graph. I’m referring to all of the parts of the graph that don’t actually display data or assist reader cognition. Create more readability by deleting unnecessary lines. 

The default chart, on the left, has black gridlines. These stand out quite a bit because of how well black contrasts against the white chart background. But the gridlines shouldn’t be standing out so much because they are not the most important part of the graph 

I like that Stephanie keeps the gridlines. I’ve seen Tufte advocate removing them altogether but there’s a lot of value in keeping them in; just don’t make them the sharpest focus color.

Comments closed

Parameters in Rmarkdown Files

Neil Saunders shows how you can parameterize Rmarkdown files, consequently making changes easy later:

The reports follow a common template where the major difference is simply the hashtag. So one way to create these reports is to use the previous one, edit to find/replace the old hashtag with the new one, and save a new file.

That works…but what if we could define the hashtag once, then reuse it programmatically anywhere in the document? Enter Rmarkdown parameters.

The example is small but important.

Comments closed

Monitoring Kafka Streams with JMX Metrics

Rishi Khandelwal provides a reference architecture for monitoring a Kafka Streams application using JMX Metrics and pushing the results into Graphite:

Service (application) exposes the JMX metrics at some port which will be captured by Jolokia java agent. Then Jolokia exposes those metrics at some port which is easily accessible through a rest endpoint (we call it Jolokia URL). Then we have JMX2Graphte which polls the metrics from Jolokia URL and push it to Graphite. Then Grafana reads the Graphite metrics and creates a beautiful dashboard for us along with the alerts.

So this is the working of the proposed monitoring solution. Now let’s discuss the components of the monitoring solution.

There’s a bit of code/configuration in here as well, so check it out.

Comments closed

Deleting in Azure Data Factory

Meagan Longoria is happy that Azure Data Factory v2 now has a Delete activity:

It is a common practice to load data to blob storage or data lake storage before loading to a database, especially if your data is coming from outside of Azure. We often create a staging area in our data lakes to hold data until it has been loaded to its next destination. Then we delete the data in the staging area once our subsequent load is successful. But before February 2019, there was no Delete activity. We had to write an Azure Function or use a Logic App called by a Web Activity in order to delete a file. I imagine every person who started working with Data Factory had to go and look this up.

But now Data Factory V2 has a Delete activity.

Meagan shows how it works, what kinds of parameters you can set, and a couple of gotchas, so check it out.

Comments closed

PolyBase and Pushdown Limitations

I have a post covering something I learned about predicate pushdown against Hadoop in PolyBase:

Before I start, let’s talk about predicate pushdown for a moment. The gist of it is that when you have data in two sources, you have two options for combining the data:

1. Bring the data in its entirety from your remote source to your local target and work as though everything were in the local target to begin with. I’ll call this the streaming approach.

2. Send as much of your query’s filters, projections, and pre-conditions to the remote source, have the remote source perform some of the work, and then have the remote source send its post-operative data to the local target. Then, the local target once more treats this as though it were simply local data. This is the pushdown approach because you push down those predicates (that is, filters, projections, and pre-conditions).

Click through for the unfortunate finding and also vote up my UserVoice feature request if you want to see string columns as filters.

Comments closed

Running Totals in DAX

Alberto Ferrari shows how you can calculate running totals in DAX:

A very common calculation in DAX is the year-to-date calculation (YTD), which aggregates values from the beginning of the year all the way to a certain date. A simple implementation uses the predefined DATESYTD function:

Sales YTD :=
CALCULATE (
[Sales Amount],
DATESYTD( 'Date'[Date] )
)

But click through to see when this function stops being useful and what you should replace it with when it does.

Comments closed