Press "Enter" to skip to content

Curated SQL Posts

Compilations per Second in SQL Server

Fabiano Amorim clarifies a metric’s definition:

As you can see, the number of SQL Compilations/Sec is very high. It’s important to step back and remember the general description and guideline for this counter and understand what I mean by “high”:

Official Description: “Number of SQL compilations per second. Indicates the number of times the compile code path is entered.”

Read on for a dive into ad hoc SQL statements parameterization, how an instance can have a high compilations/sec value relative to batch requests/sec, and how that can affect performance in the long run.

Comments closed

Handling Content Access Requests in Power BI

Marc Lelijveld walks us through the process of requesting (and granting) access to content in Power BI:

When we look at the Power BI ecosystem, we can identify a bunch of different artifacts. For example, dataflows, datasets, reports, dashboards and many derivatives. As I explained in the previous post, the best practice for sharing content is through a Power BI App, which includes a list of users or active directory group containing multiple users. With that, the content becomes available after publishing to those who are granted access. Though, it can happen that one of the users shares the link with other users who do not have access to the content. As a property of the Power BI App, you can allow users to share the app and underlying dataset with share permissions. Though, working with sensitive data this might now be what you are looking for, as you might loose control over who has access.

Read on to see what constitutes a content access request and what you can do about them.

Comments closed

Explaining an ML Model with SHAP

Dan Lantos, et al, walk us through one technique for model explainability:

Interpretability has to do with how accurately a machine learning model can associate a cause (input) to an effect (output). 

Explainability on the other hand is the extent to which the internal mechanics of a machine or deep learning system can be explained in human terms. Or to put it simply, explainability is the ability to explain what is happening. 

Let’s consider a simple example illustrated below where the goal of the machine learning model is to classify an animal into its respective groups. We use an image of a butterfly as input into the machine learning model. The model would classify the butterfly as either an insect, mammal, fish, reptile or bird. Typically, most complex machine learning models would provide a classification without explaining how the features contributed to the result. However, using tools that help with explainability, we can overcome this limitation. We can then understand what particular features of the butterfly contributed to it being classified as an insect. Since the butterfly has six legs, it is thus classified as an insect.

Being able to provide a rationale behind a model’s prediction would give the users (and the developers) confidence about the validity of the model’s decision.

Read on to see how you can use a library called SHAP in Python to help with this explainability.

Comments closed

A Cross-Platform Comparison of JSON in Relational Databases

Lukas Eder gives us some information on SQL/JSON and where different relational database management systems are in their JSON journies:

Building (dogfooding) on top of our own SQL/JSON API has revealed a lot of caveats of the various SQL/JSON implementations across vendors, and to be frank, it’s been a bit of a sobering experience. Despite there now being the ISO/IEC TR 19075:6 standard (mostly driven by Oracle this time), many vendors have already implemented some kind of JSON support, and it looks differently in all dialects – to the extent where writing vendor agnostic SQL/JSON is almost impossible with hand written native SQL. You’ll need an API like jOOQ or any other abstraction to standardise the different dialects.

Click through for the survey.

Comments closed

Identifying Troublesome NOLOCK Statements

Aaron Bertrand is on a mission:

I’ve warned before about the possible downsides of both NOLOCK in general and, more specifically, when used against the target of an update or delete. While Microsoft claims that corruption errors due to the latter have been fixed in cumulative updates (e.g. see KB #2878968), we’re still seeing an occasional related issue where SQL Server will terminate, producing a stack dump that indicates a DML statement with NOLOCK as the cause. How do I find and correct all these potentially problematic statements?

The contrarian in me says, “You’re using NOLOCK; they’re all trouble.” But Aaron is a lot nicer about it than I am here.

Comments closed

Unique Resource Names and Azure

Meagan Longoria gives us a warning:

Each resource type in Azure has a naming scope within which the resource name must be unique. For PaaS resources such as Azure SQL Server (server for Azure SQL DB) and Azure Data Factory, the name must be globally unique within the resource type. This means that you can’t have two data factories with the same name, but you can have a data factory and a SQL server with the same name. Virtual machine names must be unique within the resource group. Azure Storage accounts must be globally unique. Azure SQL Databases should be unique within the server.

Since Azure allows you to create a data factory and a SQL server with the same resource name, you may think this is fine. But you may want to avoid this, especially if you plan on using system-defined managed identities or using Azure PowerShell/CLI. And if you aren’t planning on using these things, you might want to reconsider.

Click through for a demonstration of how you might get into trouble with this.

Comments closed

Deploying SQL Server via Ansible

Amit Khandelwal gives us another way of deploying SQL Server on Linux:

Today, we’ll look at how to automate SQL Server deployment and configuration on Linux. To automate our deployment, we will use the Ansible system role, which is available here.

Note: The Ansible system role that I use in this blog is a sample system role that is provided as is and for reference only. Microsoft and RedHat do not support this. However, I invite you to provide feedback and suggestions for improving the system role here: Issues linux-system-roles/mssql (github.com).

Read on for the instructions and a demonstration.

Comments closed

Nonclustered Index Leaf Records and Null Bitmaps

Alex Stuart lays out a finding:

While testing a script that involved calculating index record size recently I was getting some confusing results depending on server version, and after some digging it appears there was a somewhat undocumented change to nonclustered index leaf page structure in SQL Server 2012.

Prior to 2012, as dicussed by Paul Randal in this 2010 blog post (which is still the top result for searching for ‘nonclustered index null bitmap’, hence this post) the null bitmap – that is, a >= 3 byte structure representing null fields in a record – was essentially present in all data pages but not the leaf pages of a nonclustered index that had no nulls in either the index key or any clustering key columns.

Read on for a demonstration using SQL Server 2008 R2 as well as SQL Server 2012.

Comments closed

Inferring Median from a Few Values

Holger von Jouanne-Diedrich is stuck in the middle with you:

Let us dive directly into the matter, the Small Data Rule states:

In a sample of five numerical values from any unknown population, the median of this population lies between the smallest and the largest sample value with 94 percent certainty.

The “population” can be anything, like data about age in a population, income in a country, television consumption, donation amounts, body sizes, temperatures and so on.

This is a very interesting concept. Five values won’t give you the median, but it will give you a bounded expectation with high likelihood. And check out the comments: adding a few more data points increases the expected likelihood even further.

1 Comment