Press "Enter" to skip to content

Curated SQL Posts

Query Store & Spinlocks

David Fowler ran into an issue with the Query Store process experiencing waits due to spinlocks:

We were enjoying a nice peaceful afternoon when we hear panicked shouting that a SQL Server had become unresponsive and the customers were unable to do anything.

We moseyed on down to the server in question to take a look at it.  One thing stood out immediately, CPU was pegged out at 100% but SQL itself didn’t actually seem to be doing anything, transactions\second was on the floor. Unfortunately this happened a while back and I didn’t think to capture any graphs or metrics at the time so you’re just going to have to take my word for this.

The issue David ran into was subsequently fixed, making this a cautionary tale to keep those SQL Server instances patched.

Comments closed

Dealing With Zero-Value Rows In dplyr

Kieran Healy shows an oddity in dplyr when dealing with zero-value records:

That looks fine. You can see in each panel the 2015 column is 100% Men. If we were working on this a bit longer we’d polish up the x-axis so that the dates were centered under the columns. But as an exploratory plot it’s fine.

But let’s say that, instead of a column plot, you looked at a line plot instead. This would be a natural thing to do given that time is on the x-axis and so you’re looking at a trend, albeit one over a small number of years.

This is behavior I hadn’t run into, and it does seem a bit odd.  On a totally unrelated note, Healy’s Data Visualization: A Practical Introduction is one of the best books on the topic.

Comments closed

The Good And Bad Of Dataflows

Teo Lachev gives us the lowdown on Dataflows in Power BI:

There is a lot to like about dataflows. I can think of two primary self-service scenarios that can benefit from dataflows:

  • Data staging – Many organizations implement operational data stores (ODS) and staging databases before the data is processed and loaded in a data warehouse. As a business user, you can use data-flows for a similar purpose. For example, one of our clients is a large insurance company that uses Microsoft Dynamics 365 for customer relationship management. Various data analysts create data models from the same CRM data, but they find that refreshing the CRM data is time consuming. Instead, they can create a dataflow to stage some CRM entities before importing them in Power BI Desktop. Even better, you could import the staged CRM data into a single dataset or in an organizational semantic model to multiple data copies and duplicating business logic.

  • Certified datasets – One way to improve data quality and promote better self-service BI is to prepare a set of certified common entities, such as Organization, Product, and Vendor. A data steward can be responsible for designing and managing these entities. Once in place, data analysts can import the certified entities in their data models.

Read on for some more positives and negatives.

2 Comments

Checking If Ports Are Open Using Powershell

Anthony Nocentino has a quick Powershell script to see if ports are open on a machine:

Ever want to confirm that a port is accessible from one computer to another? There’s a PowerShell cmdlet for that, Test-NetConnection. With the -Port option, this cmdlet will do a quick TCP three-way handshake on the remote system to confirm that the service is available and reports back if it succeeded or not. Check out that last line of output TcpTestSucceeded: False. That indicates that this port is not accessible. You can see, however, that the system is reachable via ICMP (Ping), PingSuceeded: True so we know that the remote system is alive, just not listening on the port we want to access.

For when your security team won’t let you install nmap.

Comments closed

Root Cause Discovery For Availability Group Failovers

Sourabh Agarwal announces a new tool to determine why your Availability Group failed over:

The first step in using the utility is to configure the configuration.json file to include the location of the data files and the details of the availability group for which analysis is being done. For a correct analysis, all replicas in the availability groups needs to be listed in the configuration file.

The next step is to capture the various logs from each of the replicas and add those under the data folder. The following files are required for the analysis.

  • SQL error logs

  • Always On Availability Groups Extended Event Logs

  • System Health Extended Event Logs

  • System log

  • Windows cluster log

This looks interesting.

Comments closed

What Power BI Costs

Eugene Meidinger delineates what is free versus paid with respect to Power BI:

The Power BI Service, think powerbi.com, allows for free users. These free users can create reports and upload them, but with a significant number of limitations. The biggest is you only have one way of sharing content to others. Specifically with Publish to Web, which essentially makes your entire report free to the public.

You also only have one way of privately consuming other people’s reports, and that’s if someone places content in Power BI Premium. Otherwise, other users can’t share their reports directly with you. Power BI Free users are truly and island to themselves.

One other thing worth nothing is that you can’t sign up with a personal email. David Eldersveld has a good blog post on the issue. As of this writing, the uservoice request to change this has 2,800 votes.

See here for some more limitations of the free version of Power BI.

Read the whole thing.

Comments closed

A Pessimistic View Of The State Of Deep Learning

William Vorhies provides us a negative (and necessary) look at the current state of Deep Learning solutions:

Reinforcement Learning (RL) is arguably the hottest research area in AI today because it appears RL can be adapted to any problem that has a well-defined reward function.  That encompasses game play, robotics, self-driving cars, and frankly pretty much else in machine learning.

Within RL, the hottest research area is Deep RL which means using a deep neural net as the ‘agent’ in the training.  Deep RL is seen as the form of RL with the most potential to generalize over the largest number of cases and perhaps the closest we’ve yet come to AGI (artificial general intelligence).

Importantly, Deep RL is also the technique used to win at Alpha Go which brought it huge attention.

The problem is, according to Alex Irpan, a researcher on the Google Brain Robotics team that about 70% of the time they just don’t work.

Alex has written a very comprehensive article critiquing the current state of Deep RL, the field with which he engages on a day-to-day basis.  He lays out a whole series of problems and we’ve elected to focus on the three that most clearly illustrate the current state of the problem with notes from his work.

Vorhies is not unduly negative and is optimistic in the medium to long term, but he is right in noting that there is a lot of work yet to do in this field.

Comments closed

Object Detection Methods In Convolutional Neural Networks

Xindian Long explains some of the techniques which convolutional neural networks use to discern objects in images:

Can we do object detection in a smart way by only looking at some of the windows? The answer is yes. There are two approaches to find this subset of windows, which lead to two different categories of object detection algorithms.

  1. The first algorithm category is to do region proposal first. This means regions highly likely to contain an object are selected either with traditional computer vision techniques (like selective search), or by using a deep learning-based region proposal network (RPN). Once you have gathered the small set of candidate windows, you can formulate a set number of regression models and classification models to solve the object detection problem. This category includes algorithms like Faster R-CNN[1], R_FCN[2] and FPN-FRCN[3]. Algorithms in this category are usually called two-stage methods. They are generally more accurate, but slower than the single-stage method we introduce below.
  2. The second algorithm category only looks for objects at fixed locations with fixed sizes. These locations and sizes are strategically selected so that most scenarios are covered. These algorithms usually separate the original images into fixed size grid regions. For each region, these algorithms try to predict a fixed number of objects of certain, pre-determined shapes and sizes. Algorithms belonging to this category are called single-stage methods. Examples of such methods include YOLO[4], SSD[5] and RetinaNet[6]. Algorithms in this category usually run faster but are less accurate. This type of algorithm is often utilized for applications requiring real-time detection.

We’ll discuss two common object detection methods below in more detail.

This is a high-level explanation with no code, but it does a good job of describing at that level what is going on.

Comments closed

Deploying SQL Server To Minikube

Chris Adkin continues a series on Kubernetes with a look at Minikube:

Once the cluster is up, lets deploy a stand alone SQL Server instance to it, availability groups will be covered at a later date. To do this several different types of object will need to be created:

  • Storage objects; persistent volume claims, these will lead to the automatic creation of a persistent volumes.

  • A secret to hold the password for the sa account.

  • A deployment, this embodies the SQL Server instance.

  • A service, this provides the means of accessing the instance via an ip address and port from outside the cluster.

Click through for a demo.

Comments closed