Press "Enter" to skip to content

Day: July 29, 2021

Explaining an ML Model with SHAP

Dan Lantos, et al, walk us through one technique for model explainability:

Interpretability has to do with how accurately a machine learning model can associate a cause (input) to an effect (output). 

Explainability on the other hand is the extent to which the internal mechanics of a machine or deep learning system can be explained in human terms. Or to put it simply, explainability is the ability to explain what is happening. 

Let’s consider a simple example illustrated below where the goal of the machine learning model is to classify an animal into its respective groups. We use an image of a butterfly as input into the machine learning model. The model would classify the butterfly as either an insect, mammal, fish, reptile or bird. Typically, most complex machine learning models would provide a classification without explaining how the features contributed to the result. However, using tools that help with explainability, we can overcome this limitation. We can then understand what particular features of the butterfly contributed to it being classified as an insect. Since the butterfly has six legs, it is thus classified as an insect.

Being able to provide a rationale behind a model’s prediction would give the users (and the developers) confidence about the validity of the model’s decision.

Read on to see how you can use a library called SHAP in Python to help with this explainability.

Comments closed

A Cross-Platform Comparison of JSON in Relational Databases

Lukas Eder gives us some information on SQL/JSON and where different relational database management systems are in their JSON journies:

Building (dogfooding) on top of our own SQL/JSON API has revealed a lot of caveats of the various SQL/JSON implementations across vendors, and to be frank, it’s been a bit of a sobering experience. Despite there now being the ISO/IEC TR 19075:6 standard (mostly driven by Oracle this time), many vendors have already implemented some kind of JSON support, and it looks differently in all dialects – to the extent where writing vendor agnostic SQL/JSON is almost impossible with hand written native SQL. You’ll need an API like jOOQ or any other abstraction to standardise the different dialects.

Click through for the survey.

Comments closed

Identifying Troublesome NOLOCK Statements

Aaron Bertrand is on a mission:

I’ve warned before about the possible downsides of both NOLOCK in general and, more specifically, when used against the target of an update or delete. While Microsoft claims that corruption errors due to the latter have been fixed in cumulative updates (e.g. see KB #2878968), we’re still seeing an occasional related issue where SQL Server will terminate, producing a stack dump that indicates a DML statement with NOLOCK as the cause. How do I find and correct all these potentially problematic statements?

The contrarian in me says, “You’re using NOLOCK; they’re all trouble.” But Aaron is a lot nicer about it than I am here.

Comments closed

Unique Resource Names and Azure

Meagan Longoria gives us a warning:

Each resource type in Azure has a naming scope within which the resource name must be unique. For PaaS resources such as Azure SQL Server (server for Azure SQL DB) and Azure Data Factory, the name must be globally unique within the resource type. This means that you can’t have two data factories with the same name, but you can have a data factory and a SQL server with the same name. Virtual machine names must be unique within the resource group. Azure Storage accounts must be globally unique. Azure SQL Databases should be unique within the server.

Since Azure allows you to create a data factory and a SQL server with the same resource name, you may think this is fine. But you may want to avoid this, especially if you plan on using system-defined managed identities or using Azure PowerShell/CLI. And if you aren’t planning on using these things, you might want to reconsider.

Click through for a demonstration of how you might get into trouble with this.

Comments closed

Deploying SQL Server via Ansible

Amit Khandelwal gives us another way of deploying SQL Server on Linux:

Today, we’ll look at how to automate SQL Server deployment and configuration on Linux. To automate our deployment, we will use the Ansible system role, which is available here.

Note: The Ansible system role that I use in this blog is a sample system role that is provided as is and for reference only. Microsoft and RedHat do not support this. However, I invite you to provide feedback and suggestions for improving the system role here: Issues linux-system-roles/mssql (github.com).

Read on for the instructions and a demonstration.

Comments closed

Nonclustered Index Leaf Records and Null Bitmaps

Alex Stuart lays out a finding:

While testing a script that involved calculating index record size recently I was getting some confusing results depending on server version, and after some digging it appears there was a somewhat undocumented change to nonclustered index leaf page structure in SQL Server 2012.

Prior to 2012, as dicussed by Paul Randal in this 2010 blog post (which is still the top result for searching for ‘nonclustered index null bitmap’, hence this post) the null bitmap – that is, a >= 3 byte structure representing null fields in a record – was essentially present in all data pages but not the leaf pages of a nonclustered index that had no nulls in either the index key or any clustering key columns.

Read on for a demonstration using SQL Server 2008 R2 as well as SQL Server 2012.

Comments closed