Press "Enter" to skip to content

Author: Kevin Feasel

Big Performance Tuning Mistakes

Erik Darling enumerates common performance tuning mistakes:

Coming in at number five and looking alive! Did you know that query plans can be different on busy servers? I bet not! And aside from that, your performance problem might not even be the query itself, it may be blocking, or a poison wait. This stuff may not show up in Dev, unless yous spend a lot of time and money engineering load tests.

This is what safety groups call “Situational Awareness”, and this is the kind of stuff that you really want a monitoring tool in place for. Sure, that query ran slowly, but if that’s all you know, and you can’t reproduce it, then you need to start digging deeper.

There are a number of tips here, and that number is five.

Comments closed

Searching In Execution Plans

Pedro Lopes gives a reason to upgrade to SSMS 17.2:

But what if I could search for node id 30 while looking at the graphical showplan?

Starting with SSMS 17.2, just use CTRL+F to start a search in graphical showplan (or right-click on a blank area of the plan, and in the context menu click on Find Node option), and you can quickly see exactly where node id 30 is:

I will have to see whether that also lets you quickly find the origin of Expr#### columns.

Comments closed

Understanding K-Means Clustering

Chaitanya Sagar has a good explanation of the assumptions k-means clustering makes:

Why do we assume in the first place? The answer is that making assumptions helps simplify problems and simplified problems can then be solved accurately. To divide your dataset into clusters, one must define the criteria of a cluster and those make the assumptions for the technique. K-Means clustering method considers two assumptions regarding the clusters – first that the clusters are spherical and second that the clusters are of similar size. Spherical assumption helps in separating the clusters when the algorithm works on the data and forms clusters. If this assumption is violated, the clusters formed may not be what one expects. On the other hand, assumption over the size of clusters helps in deciding the boundaries of the cluster. This assumption helps in calculating the number of data points each cluster should have. This assumption also gives an advantage. Clusters in K-means are defined by taking the mean of all the data points in the cluster. With this assumption, one can start with the centers of clusters anywhere. Keeping the starting points of the clusters anywhere will still make the algorithm converge with the same final clusters as keeping the centers as far apart as possible.

Read on as Chaitanya shows several examples; the polar coordinate transformation was quite interesting.  H/T R-Bloggers

Comments closed

Spheres In SQL Server

Slava Murygin continues his quest to build a graphics engine with spatial data:

Couple of years ago I came up with an algorithm of drawing an ellipse using SQL Server spatial geometry: http://slavasql.blogspot.com/2015/02/drawing-ellipse-in-ssms.html

I’ve used that algorithm to make a sphere and as in my previous blog of drawing 3D Cube I use external procedure to simplify the process.
This time instead of temporary stored procedure I’m using a function to generate Geometrical content.

This has been an enjoyable series so far, showing how to build different shapes using spatial queries.

Comments closed

Parameter Sniffing Explained

Bert Wagner looks at how parameter sniffing works, why it’s (mostly) a good thing, and how to gently explain to the server when it’s a bad thing:

All subsequent executions of that same query will go to the query cache to reuse that same initial query plan — this saves SQL Server time from having to regenerate a new query plan.

Note: A query with different values passed as parameters still counts as the “same query” in the eyes of SQL Server.

In the case of the examples above, the first time the query was executed was with the parameter for “Costa Rica”. Remember when I said this dataset was heavily skewed? Let’s look at some counts:

Check it out for a clear depiction of the topic.  One solution that Bert doesn’t have but I will use sometimes is to create local variables in a procedure and set their values equal to input parameters.  That way, the optimizer doesn’t have an assumption as to the value of the local variable.  But there are several ways to get around this when it’s an issue.

Comments closed

Creating A WSFC Witness On Azure

Jim Donahoe shows how to create a cloud witness for a Windows cluster using Azure:

Well, that is easy to fix, right?  Let’s just spin up a VM in Azure, and host the FSW on that machine.  Problem solved!  Technically yes, that is a viable option.  But, let’s consider the cost of that scenario in the breakdown below:

  1. VM with OS licensed and Disk space allocated for FSW
  2. NSG/Firewall to protect said resource from outside 
  3. VNET

Also, you have to figure in the man hours in configuring all of those things(Let’s say 4 hours total.  Insert your hourly rate here:  Rate x 4 = Setup fee for VM in Azure

Now, here is where Cloud Witness saves the day!  The Cloud Witness WSFC Quorum type will utilize BLOB Storage in Azure to act as the point of arbitration.  Not sure what that means?

There’s a good walkthrough, but it does look quite easy to do, and a simple blob is going to be a lot cheaper than a VM.

Comments closed

Event Sourcing And Kafka

Ben Stopford gives an architectural overview of how Apache Kafka can act as an event sourcing system:

At a high level, event sourcing is really just the observation that in an event driven architecture, the events are facts. So if you keep them around, you can use them as a datasource.

One subtlety comes from the way the events are modelled. They can be values: whole facts (an Order, in its entirety) or they can be a set of ‘deltas’ that must be re-combined (a whole Order message, followed by messages denoting just the state changes: “amount updated to $5”, “order cancelled” etc).

As an analogy, imagine you are building a version control system. When a user commits a file for the first time, you save it. Subsequent commits might only save the ‘delta’: just the lines that were added, changed or removed. Then, when the user performs a checkout, you open the version-0 file and apply all the deltas, to derive to the current state.

The alternate approach is to simply store the whole file, exactly as it was at the time it was changed. This makes pulling out a version quick and easy, but to compare different versions you would have to perform a ‘diff’.

The series Ben has been working through is very helpful in wrapping your mind around what Kafka can do, and this post was no exception.

Comments closed

Parallel Processing In R

Chaitanya Sagar shows a few methods for parallelizing code in R:

Parallel programming may seem a complex process at first but the amount of time saved after executing tasks in parallel makes it worth the try. Functions such as lapply() and sapply() are great alternatives to time consuming looping functions while parallel, foreach and doParallel packages are great starting points to running tasks in parallel. These parallel processes are based on functions and are also modular. However, with great power comes a risk of code crashes. Hence it is necessary to be careful and be aware of ways to control memory usage and error handling. It is not necessary to parallelize every piece of code that you write. You can always write sequential code and decide to parallelize the parts which take significant amounts of time. This will help in further reducing out of memory instances and writing robust and fast codes. The use of parallel programing method is growing and many packages now have parallel implementations available. With this article. one can dive deep into the world of parallel programming and make full use of the vast memory and processing power to generate output quickly. The full code for this article is as follows.

If you’re using Microsoft R server, there are additional parallelism options. H/T R-Bloggers

Comments closed

What’s New In Ambari 2.5

Paul Codding tells us what’s coming in the next version of Ambari:

Ambari Log Search (Tech Preview) has been one of our most popular features, and in this release has seen UI, and backend refreshes based on customer feedback.  Log Search is planned for GA with the next major Ambari release, Ambari 3.0 in which the UI will be simplified, and the backend will have more robust log retention and scaling capabilities.

There are some interesting changes, so read the whole thing.

Comments closed