Cloudera Director And AWS Spot Instances

David Han shows off some new features in Cloudera Director 2.5 to help when building Hadoop clusters on AWS spot price instances:

You can configure Spot instances in Cloudera Director’s instance templates. These instance templates contain a flag indicating whether Spot instances should be used, as well as a field specifying the bid price for those instances.

Each instance group in the cluster template includes a field that indicates the minimum number of instances required in that group for the cluster to be considered successful. Cloudera Director will continue with bootstrapping or growing a cluster if the minimum count for each instance group is satisfied. Spot instances should not be used for instance groups that are required for the normal operation of the cluster, such as HDFS DataNodes. Instance groups configured to use Spot instances should set their minimum number to zero with the expectation that the instances may not be provisioned due to the Spot bid price being lower than the Spot price.

If you’re able to take advantage of spot instances, you can end up saving a pretty good amount of money.

Big Performance Tuning Mistakes

Erik Darling enumerates common performance tuning mistakes:

Coming in at number five and looking alive! Did you know that query plans can be different on busy servers? I bet not! And aside from that, your performance problem might not even be the query itself, it may be blocking, or a poison wait. This stuff may not show up in Dev, unless yous spend a lot of time and money engineering load tests.

This is what safety groups call “Situational Awareness”, and this is the kind of stuff that you really want a monitoring tool in place for. Sure, that query ran slowly, but if that’s all you know, and you can’t reproduce it, then you need to start digging deeper.

There are a number of tips here, and that number is five.

Searching In Execution Plans

Pedro Lopes gives a reason to upgrade to SSMS 17.2:

But what if I could search for node id 30 while looking at the graphical showplan?

Starting with SSMS 17.2, just use CTRL+F to start a search in graphical showplan (or right-click on a blank area of the plan, and in the context menu click on Find Node option), and you can quickly see exactly where node id 30 is:

I will have to see whether that also lets you quickly find the origin of Expr#### columns.

Understanding K-Means Clustering

Chaitanya Sagar has a good explanation of the assumptions k-means clustering makes:

Why do we assume in the first place? The answer is that making assumptions helps simplify problems and simplified problems can then be solved accurately. To divide your dataset into clusters, one must define the criteria of a cluster and those make the assumptions for the technique. K-Means clustering method considers two assumptions regarding the clusters – first that the clusters are spherical and second that the clusters are of similar size. Spherical assumption helps in separating the clusters when the algorithm works on the data and forms clusters. If this assumption is violated, the clusters formed may not be what one expects. On the other hand, assumption over the size of clusters helps in deciding the boundaries of the cluster. This assumption helps in calculating the number of data points each cluster should have. This assumption also gives an advantage. Clusters in K-means are defined by taking the mean of all the data points in the cluster. With this assumption, one can start with the centers of clusters anywhere. Keeping the starting points of the clusters anywhere will still make the algorithm converge with the same final clusters as keeping the centers as far apart as possible.

Read on as Chaitanya shows several examples; the polar coordinate transformation was quite interesting.  H/T R-Bloggers

Spheres In SQL Server

Slava Murygin continues his quest to build a graphics engine with spatial data:

Couple of years ago I came up with an algorithm of drawing an ellipse using SQL Server spatial geometry:

I’ve used that algorithm to make a sphere and as in my previous blog of drawing 3D Cube I use external procedure to simplify the process.
This time instead of temporary stored procedure I’m using a function to generate Geometrical content.

This has been an enjoyable series so far, showing how to build different shapes using spatial queries.

Quadrant Chart In Power BI

Devin Knight continues his Power BI custom visuals series:

In this module you will learn how to use the Quadrant Chart Custom Visual.  The Quadrant Chart is used to show a distribution of data across separate quadrants.

There’s an interesting mix of 2D layout plus bubble size.  This is probably one of the better custom visuals available.

Parameter Sniffing Explained

Bert Wagner looks at how parameter sniffing works, why it’s (mostly) a good thing, and how to gently explain to the server when it’s a bad thing:

All subsequent executions of that same query will go to the query cache to reuse that same initial query plan — this saves SQL Server time from having to regenerate a new query plan.

Note: A query with different values passed as parameters still counts as the “same query” in the eyes of SQL Server.

In the case of the examples above, the first time the query was executed was with the parameter for “Costa Rica”. Remember when I said this dataset was heavily skewed? Let’s look at some counts:

Check it out for a clear depiction of the topic.  One solution that Bert doesn’t have but I will use sometimes is to create local variables in a procedure and set their values equal to input parameters.  That way, the optimizer doesn’t have an assumption as to the value of the local variable.  But there are several ways to get around this when it’s an issue.

Creating A WSFC Witness On Azure

Jim Donahoe shows how to create a cloud witness for a Windows cluster using Azure:

Well, that is easy to fix, right?  Let’s just spin up a VM in Azure, and host the FSW on that machine.  Problem solved!  Technically yes, that is a viable option.  But, let’s consider the cost of that scenario in the breakdown below:

  1. VM with OS licensed and Disk space allocated for FSW
  2. NSG/Firewall to protect said resource from outside 
  3. VNET

Also, you have to figure in the man hours in configuring all of those things(Let’s say 4 hours total.  Insert your hourly rate here:  Rate x 4 = Setup fee for VM in Azure

Now, here is where Cloud Witness saves the day!  The Cloud Witness WSFC Quorum type will utilize BLOB Storage in Azure to act as the point of arbitration.  Not sure what that means?

There’s a good walkthrough, but it does look quite easy to do, and a simple blob is going to be a lot cheaper than a VM.


August 2017
« Jul