Press "Enter" to skip to content

Author: Kevin Feasel

Figuring Out Cost Threshold For Parallelism

Grant Fritchey uses R to help him decide on a good cost threshold for parallelism value:

With the Standard Deviation in hand, and a quick rule of thumb that says 68% of all values are going to be within two standard deviations of the data set, I can determine that a value of 16 on my Cost Threshold for Parallelism is going to cover most cases, and will ensure that only a small percentage of queries go parallel on my system, but that those which do go parallel are actually costly queries, not some that just fall outside the default value of 5.

I’ve made a couple of assumptions that are not completely held up by the data. Using the two, or even three, standard deviations to cover just enough of the data isn’t actually supported in this case because I don’t have a normal distribution of data. In fact, the distribution here is quite heavily skewed to one end of the chart. There’s also no data on the frequency of these calls. You may want to add that into your plans for setting your Cost Threshold.

This is a nice start.  If you’re looking for a more experimental analysis, you could try A/B testing (particularly if you have a good sample workload), where you track whatever pertinent counters you need (e.g., query runtime, whether it went parallel, CPU and disk usage) under different cost threshold regimes and do a comparative analysis.

Comments closed

Standard Edition Max Server Memory Changes In 2016 SP1

Randolph West details the changes in max server memory for Standard Edition in SQL Server 2016 SP1:

The memory limit of 128GB RAM applies only to the buffer pool (the 8KB data pages that are read from disk into memory — in other words, the database itself).

For servers containing more than 128GB of physical RAM, and running SQL Server 2016 with Service Pack 1 or higher, we now have options.

Randolph has a couple good clarifications on memory limits outside the buffer pool, making this worth the read.

Comments closed

Target Server Memory Vs Total Memory

Arun Sirpal compares the Target Server Memory and Total Memory Perfmon counters:

Target Server Memory (KB) is the amount of memory that SQL Server is willing (potential) to allocate to the buffer pool under its current load. Total Server Memory (KB) is what SQL currently has allocated.

Using SQL Server 2014 developer edition (64 bit) my machine has 12GB RAM and maximum server memory is currently set to 8GB and for the purpose of this post I have set  minimum server memory to 1GB (Lock Pages in Memory has not been set).

Read on for a nice description laden with Perfmon pictures.

Comments closed

Preventing Event Storms

Kenneth Fisher has some good advice when dealing with event notifications:

One of the most common ways to get an event notification is by email. So what happens when you get 500 emails in a day and only one or two are actionable? Do you read every single email? Spending quite literally hours to find those one or two gems? Or do you just ignore the whole lot and wait for some other notification that there is a problem. Say, by a user calling you?

Next, let’s say you have a job that runs every few minutes checking if an instance is down. When that instance goes down you get an immediate email. Which is awesome! Of course then while you are trying to fix the issue you get dozens more emails about the same outage. That is at best distracting and at worst makes it take longer for you to fix the issue.

Fun story time:  at one point during my work career, there was a person (not me!) who accidentally broke every single SQL Agent job on dozens of instances and nobody noticed it for hours.  These weren’t production instances so it wasn’t the end of the world or anything…except that included in the broken jobs were a bunch which ran every minute.  And alerted every minute.  Via e-mail.  The entire database team essentially lost e-mail access for 3 days as there were so many messages coming in that it overwhelmed our provider’s ability to serve messages to us.  This type of mistake can happen, and if we had put into place some of the things Kenneth talks about, the consequences would have been less severe.

Comments closed

How SSDT Publication Works

Ed Elliott has a great post deconstructing the SQL Server Data Tools deployment process:

The first thing to note is the garbage collection, there are like 8 changes to the line so 8 sets of garbage collection that happen which is expensive for a .net app. We get much more processor usage for a more prolonged time and we really start to see some disk activity, write activity just has one peak while the files are written but reading stays quite high throughout the whole process. It could be that it is sql reading from disk (my demo was from a single ssd with a local sql instance) but we can see after the sql cpu settles down the green disk read line stays high so it is unlikely to be pure sqlserver.exe.

What does this tell us?

Well if you have large objects, multi-thousand line stored procs then you better get yourself some serious ssd’s to deploy from. If you have smaller objects then you’ll need CPU and memory – don’t scrimp here!

Check it out.

Comments closed

Cognitive Services With R

Steph Locke shows how to use the Microsoft Cognitive Services Text Analytics API within R:

We have some different languages and we need to first do language detection before we can analyse the sentiment of our phrases

# Construct a request
response<-POST(cogapi, add_headers(`Ocp-Apim-Subscription-Key`=cogapikey), body=toJSON(mydata))

Now we need to consume our response such that we can add the language code to our existing data.frame. The structure of the response JSON doesn’t play well with others so I use data.table’s nifty rbindlist. It is a **very good* candidate for purrr but I’m not up to speed on that yet.

Check out the whole post; Steph makes it look easy.

Comments closed

Building A Neural Net

Shirin Glander has a great post on using Spark + sparklyr + h2o + rsparkling to build a neural net to study arrhythmia of the heart:

The data I am using to demonstrate the building of neural nets is the arrhythmia dataset from UC Irvine’s machine learning database. It contains 279 features from ECG heart rhythm diagnostics and one output column. I am not going to rename the feature columns because they are too many and the descriptions are too complex. Also, we don’t need to know specifically which features we are looking at for building the models. For a description of each feature, see https://archive.ics.uci.edu/ml/machine-learning-databases/arrhythmia/arrhythmia.names. The output column defines 16 classes: class 1 samples are from healthy ECGs, the remaining classes belong to different types of arrhythmia, with class 16 being all remaining arrhythmia cases that didn’t fit into distinct classes.

Very interesting post.

Comments closed

Prophet

Rodrigo Agundez looks at Prophet, Facebook’s new API for store sales forecasting:

The data is of a current client, therefore I won’t be disclosing any details of it.

Our models make forecasts for different shops of this company. In particular I took 2 shops, one which contains the easiest transactions to predict from all shops, and another with a somewhat more complicated history.

The data consists of real transactions since 2014. Data is daily with the target being the number of transactions executed during a day. There are missing dates in the data when the shop closed, for example New Year’s day and Christmas.

The holidays provided to the API are the same I use in our model. They contain from school vacations or large periods, to single holidays like Christmas Eve. In total, the data contains 46 different holidays.

It looks like Prophet has some limitations but can already make some nice predictions.

Comments closed