Press "Enter" to skip to content

Month: August 2019

Blank Rows and DAX

Alberto Ferrari explains how different DAX functions treat blank rows differently:

DAX offers two functions to retrieve the list of values of a column: VALUES and DISTINCT. The difference between the two is subtle. To understand it better, we first need to introduce the concept of the blank row. The blank row is a special row added to a table, in case the table is on the one-side of a strong relationship and the relationship is invalid. Let us elaborate on this.

Click through for the detailed explanation, along with plenty of examples.

Comments closed

Calculated Columns and Memory Usage in Analysis Services

Teo Lachev troubleshoots a customer issue:

Scenario: A client reports a memory spike during processing. They have a Tabular semantic model deployed to Azure Analysis Services. They fully process the model daily. The model normally takes less than 50 GB RAM but during processing, it spikes five times and Azure Analysis Services terminates the processing task complaining that it “reached the maximum allowable memory in our pricing tier”. Normally, fully processing the model should take about twice the memory but five times?

Teo gives us the explanation for this problem as well as a recommendation on how to fix it.

Comments closed

Waiting on Stats Refreshes

Erik Darling looks at a new wait type:

I mean ever really wondered just how long a query of yours waited on stats to automatically update before running?

Have you ever been absolutely, positively sure that those gosh dang darn stats updates were putting your query on the trolley to Slowsville?

Your wishes will be 100% granted in SQL Server 2019.

This is a wait type that I’d consider useful but hopefully uncommon.

Comments closed

Requirements for DevOps Success

Grant Fritchey lays out what you need to succeed at DevOps:

The project managers and others flip. Delivery is slipping. The amount of code being written has changed. Stuff is happening that wasn’t on the schedule. The implementation of DevOps is shut down quickly.

You have to get buy-in from management before you attempt to implement DevOps or it will fail. They have to understand what you’re doing, why you’re doing it, and the measurable benefits it will bring.

Click through for some good thoughts, none of which is “use this software.”

Comments closed

Calculating AUC in R

Andrew Treadway shows how you can calculate Area Under the Curve in R:

AUC is an important metric in machine learning for classification. It is often used as a measure of a model’s performance. In effect, AUC is a measure between 0 and 1 of a model’s performance that rank-orders predictions from a model. For a detailed explanation of AUC, see this link.

Since AUC is widely used, being able to get a confidence interval around this metric is valuable to both better demonstrate a model’s performance, as well as to better compare two or more models. For example, if model A has an AUC higher than model B, but the 95% confidence interval around each AUC value overlaps, then the models may not be statistically different in performance. We can get a confidence interval around AUC using R’s pROC package, which uses bootstrapping to calculate the interval.

There are plenty of ways to calculate this useful metric, but this is definitely one of the easier methods. H/T R-bloggers

Comments closed

Performance Tuning Neural Network Training

Sean Owen takes us through a few techniques for speeding up neural network model training:

Step #2: Use Early Stopping
Keras (and other frameworks) have built-in support for stopping when further training appears to be making the model worse. In Keras, it’s the EarlyStopping callback. Using it means passing the validation data to the training process for evaluation on every epoch. Training will stop after several epochs have passed with no improvement. restore_best_weights=True ensures that the final model’s weights are from its best epoch, not just the last one. This should be your default.

Sean focuses here on Keras + TensorFlow on Spark, but several of the tips are cross-product and generally applicable.

Comments closed

Azure Data Studio August Release

Alan Yu announces a new version of Azure Data Studio:

The key highlights to cover this month include:
– SandDance integration—A new way to interact with data
– Notebook improvements
– SQL Server Dacpac extension can support Azure Active Directory
– SQL Server 2019 extension
– Visual Studio Code merge 1.37
– Bug fixes

Being able to add a new cell inline is nice, especially when you’re dealing with larger notebooks.

Comments closed

Isolation Levels and Dynamic SQL

Max Vernon points out how transaction isolation levels work when combined with sp_executesql:

Imagine you have a piece of code where you don’t care about the downsides to the “read uncommitted” isolation level, and do your due diligence by adding SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; at the start of your code. The code following that statement will run under that isolation level, as expected. However, if you call dynamic T-SQL with sys.sp_executesql, and set the isolation level there, the dynamic code will run under the READ UNCOMMITTED, however the isolation level will not be changed for the calling code. In other words, be careful about where you set the isolation level.

Click through for a demonstration of this.

Comments closed

Auditing Database Backups

Jovan Popovic shows how you can audit who is taking backups on an Azure SQL Managed Instance:

One mechanism to ensure that nobody can take the COPY_ONLY backup of your database is to use Transparent Data Encryption that automatically encrypts all backups. In that case you would need to use Customer-managed (BYOK) TDE where you will keep your encryption key in Azure Key-Vault. User-initiated COPY_ONLY backups are (currently) not allowed if you are using Service-managed TDE.

If you don’t use TDE on the database or there is a risk that someone can remove TDE from database and then take a backup, Managed Instance provides auditing mechanism that enables you to track who performed a backup and when. This way you can always track if some unauthorized action is happening.

Read on for the solution. Looking through it, it seems like conceptually it’d work equally well with on-prem/IaaS SQL Server as with Managed Instances.

Comments closed

In the Papers: Plan Stitch

Brent Ozar reviews a Microsoft Research paper:

In the Microsoft Research paper Plan Stitch: Harnessing the Best of Many Plans by Bailu Ding, Sudipto Das, Wentao Wu, Surajit Chaudhuri, and Vivek Narasayya (2018), the authors propose something really cool: watching an execution plan change over time, splitting it into parts, and then combining parts to form a SuperPlan™. (They don’t call it a SuperPlan™ – it just seems appropriate, right?)

That looks like an interesting paper, and Brent has a few more if you agree.

Comments closed