Press "Enter" to skip to content

Day: August 20, 2019

Calculating AUC in R

Andrew Treadway shows how you can calculate Area Under the Curve in R:

AUC is an important metric in machine learning for classification. It is often used as a measure of a model’s performance. In effect, AUC is a measure between 0 and 1 of a model’s performance that rank-orders predictions from a model. For a detailed explanation of AUC, see this link.

Since AUC is widely used, being able to get a confidence interval around this metric is valuable to both better demonstrate a model’s performance, as well as to better compare two or more models. For example, if model A has an AUC higher than model B, but the 95% confidence interval around each AUC value overlaps, then the models may not be statistically different in performance. We can get a confidence interval around AUC using R’s pROC package, which uses bootstrapping to calculate the interval.

There are plenty of ways to calculate this useful metric, but this is definitely one of the easier methods. H/T R-bloggers

Comments closed

Performance Tuning Neural Network Training

Sean Owen takes us through a few techniques for speeding up neural network model training:

Step #2: Use Early Stopping
Keras (and other frameworks) have built-in support for stopping when further training appears to be making the model worse. In Keras, it’s the EarlyStopping callback. Using it means passing the validation data to the training process for evaluation on every epoch. Training will stop after several epochs have passed with no improvement. restore_best_weights=True ensures that the final model’s weights are from its best epoch, not just the last one. This should be your default.

Sean focuses here on Keras + TensorFlow on Spark, but several of the tips are cross-product and generally applicable.

Comments closed

Azure Data Studio August Release

Alan Yu announces a new version of Azure Data Studio:

The key highlights to cover this month include:
– SandDance integration—A new way to interact with data
– Notebook improvements
– SQL Server Dacpac extension can support Azure Active Directory
– SQL Server 2019 extension
– Visual Studio Code merge 1.37
– Bug fixes

Being able to add a new cell inline is nice, especially when you’re dealing with larger notebooks.

Comments closed

Isolation Levels and Dynamic SQL

Max Vernon points out how transaction isolation levels work when combined with sp_executesql:

Imagine you have a piece of code where you don’t care about the downsides to the “read uncommitted” isolation level, and do your due diligence by adding SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; at the start of your code. The code following that statement will run under that isolation level, as expected. However, if you call dynamic T-SQL with sys.sp_executesql, and set the isolation level there, the dynamic code will run under the READ UNCOMMITTED, however the isolation level will not be changed for the calling code. In other words, be careful about where you set the isolation level.

Click through for a demonstration of this.

Comments closed

Auditing Database Backups

Jovan Popovic shows how you can audit who is taking backups on an Azure SQL Managed Instance:

One mechanism to ensure that nobody can take the COPY_ONLY backup of your database is to use Transparent Data Encryption that automatically encrypts all backups. In that case you would need to use Customer-managed (BYOK) TDE where you will keep your encryption key in Azure Key-Vault. User-initiated COPY_ONLY backups are (currently) not allowed if you are using Service-managed TDE.

If you don’t use TDE on the database or there is a risk that someone can remove TDE from database and then take a backup, Managed Instance provides auditing mechanism that enables you to track who performed a backup and when. This way you can always track if some unauthorized action is happening.

Read on for the solution. Looking through it, it seems like conceptually it’d work equally well with on-prem/IaaS SQL Server as with Managed Instances.

Comments closed

In the Papers: Plan Stitch

Brent Ozar reviews a Microsoft Research paper:

In the Microsoft Research paper Plan Stitch: Harnessing the Best of Many Plans by Bailu Ding, Sudipto Das, Wentao Wu, Surajit Chaudhuri, and Vivek Narasayya (2018), the authors propose something really cool: watching an execution plan change over time, splitting it into parts, and then combining parts to form a SuperPlan™. (They don’t call it a SuperPlan™ – it just seems appropriate, right?)

That looks like an interesting paper, and Brent has a few more if you agree.

Comments closed

When Adding Indexes Hurts Performance

Jeffry Schwartz takes us through an odd case:

Recently, a customer requested that we tune a query that took 13 seconds to return 11 rows.  SQL Server 2017 suggested an index to improve performance, so we added it in a development environment.  The improvement made the query run 647 seconds, almost 50 TIMES longer than the original!  This naturally caused much consternation, so we decided to determine what and why it happened as well as how we could still achieve the original objective, i.e., make the query run faster.  This article discusses what caused the original performance problem in addition to the new one that was caused by the introduction of an index, and illustrates how we were able to make the query run significantly faster than it did originally.  We will cover reading query plans, examining the specific details of query plan operators, the effects of index statistics on missing index recommendations, using query plan XML to enable simpler query plan comparison, and the effects of using functions in where clauses. 

Click through to understand how this could be the case.

Comments closed

When xp_logininfo Fails

Gianluca Sartori helps Future Gianluca (and present us in the meantime) troubleshoot issues with xp_logininfo:

The user does not exist
This is very easy to check: does the user exist in Windows? Did you misspell the name?

You can check this from a cmd window, issuing this command:

net user SomeUser /domain

If you spelled the user correctly, the command will return information about it, like description, password settings, group membership and so on.

There are a few other potential causes, so click through for those.

Comments closed