How Do You Respond?

Kevin Feasel

2016-06-14

T-SQL

Erik Darling has a new interview question:

A new developer has been troubleshooting a sometimes-slow stored procedure, and wants you to review their progress so far. Tell me what could go wrong here.

This one’s a bit tougher than some of the early interview questions, and I think you can see that based on the responses in the comments.

Dueling Log Backup Jobs

Robert Davis ran into HADR_WORK_QUEUE waits recently:

Our 3rd party monitoring solution collects blocking information, but not for system threads. There was no additional information available for this blocking incident, but I could see that the system thread was a background process with the command “UNKNOWN TOKEN” and was sitting in a wait type of “HADR_WORK_QUEUE”. It was clearly the worker thread for the AG of a specific database.

A little later, we had blocking again involving that same thread, but this time, the AG worker thread was blocking the log backup thread. Seemed logical that if the worker thread could block the log backup, then the log backup could have also blocked the worker thread, but still it did not make sense to me.

This is one of those cases in which the answer makes perfect sense after the fact, but can be maddening until then.

Microsoft & FreeBSD

Kevin Feasel

2016-06-13

Cloud

Serdar Yegulalp points out that a new Azure VM image for FreeBSD has Microsoft as the publisher:

The other question people are likely to ask is why, kernel contributions notwithstanding, is Microsoft listed as the publisher of the distro? The short answer: support.

According to Microsoft’s blog post, the FreeBSD Foundation is a community of mutually supportive users, “not a solution provider or an ISV with a support organization.” The kinds of customers who run FreeBSD on Azure want to have service-level agreements of some kind, and the FreeBSD Foundation isn’t in that line of work.

The upshot: If you have problems with FreeBSD on Azure, you can pick up the phone and get Microsoft to help out — but only if you’re running its version of FreeBSD.

To be honest, I don’t see this as a big deal.  I’m glad the image is there, but this hardly seems like a landmark change in anything to me.

Adding An Index To A Spark RDD

Kevin Feasel

2016-06-13

Spark

Arijit Tarafdar gives us a good method for adding an index column to a Spark data frame based on a non-unique value:

The basic idea is to create a lookup table of distinct categories indexed by unique integer identifiers. The way to avoid is to collect the unique categories to the driver, loop through them to add the corresponding index to each to create the lookup table (as Map or equivalent) and then broadcast the lookup table to all executors. The amount of data that can be collected at the driver is controlled by the spark.driver.maxResultSize configuration which by default is set at 1 GB for Spark 1.6.1. Both collect and broadcast will eventually run into the physical memory limits of the driver and the executors respectively at some point beyond certain number of distinct categories, resulting in a non-scalable solution.

The solution is pretty interesting:  build out a new RDD of unique results, and then join that set back.  If you’re using SQL (including Spark SQL), I would use the DENSE_RANK() window function.

Early Thoughts On SQL Server 2016

Koen Verbeeck has some initial thoughts on using 2016 in a POC:

  • AutoAdjustBufferSize property of the SSIS data flow. Done with manually setting the Buffer Size and Buffer Max Rows. Just set this property to true and the data flow takes care of its own performance.

  • Custom logging levels in the SSIS Catalog. Now I can finally define a logging level that only logs errors and warnings AND set it as the server-wide default level.

  • The DROP TABLE IF EXISTS syntax. The shorter the code, the better 🙂

I was initially a bit concerned with AutoAdjustBufferSize because I figured I could do a better job of selecting buffer size.  Maybe on the margin I might be able to, but I think I’m going to give it a try.

SQL Server 2014 Express Docker Image

Perry Skountrianos introduces a new Docker image:

We are excited to announce the public availability of the sql server 2014 express Docker image for Windows Server Core based Containers! The public repo is hosted on Docker Hub and contains the latest docker image as well as pointers to the Dockerfile and the start PS script(hosted on Github). We hope you will find this image useful and leverage it for your container based applications!

Containerization is a huge part of modern administrative world and it’s good to see Microsoft (belatedly) jumping onto the bandwagon.

Data Science Languages

David Crook walks through his data science workflow and discusses language choice:

So I’ve spent a while now looking at 3 competing languages and I did my best to give each one a fair shake. Those 3 languages were F#, Python and R. I have to say it was really close for a while because each language has its strengths and weaknesses. That said, I am moving forward with 2 languages and a very specific way I use each one. I wanted to outline this, because for me it has taken a very long time to learn all of the languages to the level that I have to discover this and I would hate for others to go through the same exercise.

Read on for his decision, as well as how you go from “here’s some raw data” to “here are some services to expose interesting results.”

Query ElasticSearch Using Power BI

Elton Stoneman shows how to use Power BI to read data from Elasticsearch:

Kibana is the natural UI choice for partnering Elasticsearch, and it has the advantage of being Web-based and Dockerized, so it’s cross-platform and easy to share. But PowerBI is a lot more powerful, and the multitude of available connectors mean it’s easy to build a single dashboard which pulls data from multiple sources.

Using Elasticsearch for one of those sources is simple, although it will need some custom work to query your indexes and navigate the documents to get the field you want. You can even publish your reports to PowerBI in the cloud and limit access using Azure Active Directory – which gives you a nice, integrated security story.

I tend to be very hard on Kibana, particularly because it makes the easy stuff easy and the hard stuff impossible, so I think that this is an interesting alternative to Kibana.

MAX_GRANT_PERCENT

Jack Li gives an example in which MAX_GRANT_PERCENT can keep certain queries from getting runaway memory grants:

The customer has lots of waits on RESOURCE_SEMAPHORE_QUERY_COMPILE.  To troubleshoot this, we have to look from two angles.  First, did customer have many queries needing large amount of compile memory?  Secondly, was it possible that other components used too much memory, causing the threshold lowered?  In other words, if SQL Server had enough memory, those queries requiring same amount of compile memory would not have been put to wait.

We used this query and captured for several iterations of data to confirm that server didn’t have queries that required large amount of compile memory per se.

It’s nice to have this trick up your sleeve when you simply can’t get a better query in place.

SOS_Mutex

Ewald Cress continues his dive into system internals, this time looking at SOS_Mutex:

Put differently, we can build a mutex from an auto-reset EventInternal by tacking on an owner attribute, making a rule that only the owner has the right to signal the event, and adding assignment of ownership as a fringe benefit of a successful wait. A nonsignalled event means an acquired mutex, and a signalled event means that the next acquisition attempt will succeed without waiting, since nobody currently owns the mutex. The end result is that our SOS_Mutex class exposes the underlying event’sSignal() method and its own take on Wait(). From the viewpoint of the mutex consumer, the result of a successful wait is that it owns the mutex, and it should act honourably by calling Signal() as soon as it is done using the resource that the mutex stands guard over.

There’s some deep detail here, so this is definitely one of those “after your first cup of coffee” posts to read.

Categories

July 2019
MTWTFSS
« Jun  
1234567
891011121314
15161718192021
22232425262728
293031