Press "Enter" to skip to content

Curated SQL Posts

When a Non-Clustered Index on Clustered Columns Makes Sense

Allen White gives us a scenario where adding a non-clustered index which is the same column as the clustered index can make sense:

Recently I was asked about adding a non-clustered index to a table (let’s call it Images) with just one column. It had been added in the development database and it improved performance dramatically. I looked at it and it had the same key as the clustered index on that table.

In reviewing the query I saw that Images was joined to the other tables in the query, but none of the columns were used, so Images was joined to ensure that values from the other tables had rows in Images. The query plan shows a significantly higher number of reads against Images without the new NCI (non-clustered index) than when it’s present.

I do agree that this can help—as we obviously see. The backseat query tuner in me wonders if maybe there’s another way to write the query to prevent the scan by using CROSS APPLY, but that’d only help if they were getting a small percentage of rows from the parent table expression built from the combination of the clustered index scan and index seek in the second example.

Comments closed

Avoiding Diagnonal Axis Labels

Cole Nussbaumer Knaflic gives us two good alternatives for avoiding diagonal labels in data visualizations:

There is one common phenomenon in graphs that I recommend actively avoiding: diagonal axis labels. They are often observed on the x-axes of graphs, where many tools automatically rotate text when the labels become too long to fit horizontally. While this might seem like a kind favor, there are usually better options. Beyond looking messy, diagonally rotated text is slower to read. In this short post, I’ll highlight two common scenarios that lead to diagonal x-axis labels—long category names on bar charts and long date labels on line graphs—and a couple ideas to try instead.

Diagonal labels aren’t the worst on printed visuals (as you can tilt the paper to read those labels clearly), but they’re not great. When combined with screens—especially screens which change their rotation as you tilt them, like on phones—that leads to a lot of unnecessary dissatisfaction.

Comments closed

Developing for Databricks with VS Code

Gerhard Brueckl tells us what comes after notebooks for users with development backgrounds:

For those users Databricks has developed Databricks Connect (Azure docs) which allows you to work with your local IDE of choice (Jupyter, PyCharm, RStudio, IntelliJ, Eclipse or Visual Studio Code) but execute the code on a Databricks cluster. This is awesome and provides a lot of advantages compared to the standard notebook UI. The two most important ones are probably the proper integration into source control / git and the ability to extend your IDE with tools like automatic formatters, linters, custom syntax highlighting, …

While Databricks Connect solves the problem of local execution and debugging, there was still a gap when it came to pushing your local changes back to Databricks to be executed as part of a regular ETL or ML pipeline. So far you had to either “deploy” your changes by manually uploading them via the Databricks UI again or write a script that uploads it via the REST API (Azure docs).

Gerhard has a nice extension for Visual Studio Code which helps with this. I’m also a huge fan of the DatabricksPS module, so I’ll happily plug that here.

Comments closed

Stored Parameter Procedure Caching

Greg Dodd follows Betteridge’s Law of Headlines:

When SQL Server caches your plan, it caches it with the parameter values that you pass through the first time, and it assumes that the same query plan will be the best one for any parameter you pass in next time.

But does SQL Server always cache your parameters? Does it always keep track of what you pass in?

Click through for a demonstration good enough to give you a conclusive answer.

Comments closed

Pluralization with Entity Framework Core

Erik Ejlskov Jensen takes us through auto-pluralization with Entity Framework Core:

If you are using the EF Core console command dotnet ef dbcontext scaffold or Package Manager Console in Visual Studio Scaffold-DbContext, you can hook up pluralization via code in your project.

First you must add a reference to the Microsoft.EntityFrameworkCore.Design package from your startup project.

Click thorugh for a code sample.

Comments closed

The Problems with Shrinking tempdb Data Files

Andy Mallon explains why you should never shrink tempdb data files:

I recently wrote about growing, shrinking, and removing tempdb files. In that article I explained that SQL Server won’t move a page that contains an internal worktable object, and thus trying to shrink tempdb files can be futile. Today, I’m going to explain how attempting to shrink tempdb files can actually be harmful.

Andy has good advice for tempdb here. Shrinking other database is generally bad but sometimes necessary; shrinking tempdb can lead to all kinds of problems.

Comments closed

Azure Data Studio April 2020 Release

Alan Yu announces the April 2020 release of Azure Data Studio:

KQL magic extension support is now available in Azure Data Studio Notebooks. It allows you to connect, query and explore Azure Data Explorer (Kusto), ApplicationInsights and LogAnalytics data using kql (Kusto Query Language). If you are using Log Analytics today for your Azure SQL DB as described here, you can now do log metric analysis using KQL magic in Azure Data Studio Notebooks. 

KQL magic package can be downloaded from Manage Packages in Python Notebook or using pip install. In a Python Notebook in Azure Data Studio, load KQL magic using (%reload_ext Kqlmagic). Start connecting, querying, and exploring using %kql or %%kql for multi-lines.   

KQL magic allows you to see tabular results similar to SQL Notebook, where you can also have the benefits of exporting outputs to other formats (csv, Excel, JSON, XML) and using the Charting functionality. You can also take advantage of rendering charts directly with plotly for richer interactivity. 

There are several fairly big changes in here, so check them all out.

Comments closed

Building a P&L Statement with Power BI

Matt Allington has the need for some financials:

A few weeks ago, Danielle Stein Fairhurst from Plum Solutions contacted me and asked if I would speak at her financial modelling meetup group (a topic of my choice related to Power BI). I decided to speak about creating a P&L using Power BI and DAX. I first learnt how to do this way back in 2015 when I met Derek Rickard. I was still fairly early on my DAX journey at the time and certainly didn’t have any experience with financial accounting using DAX. Derek shared with me how he built his P&L using Adventure Works. I have used these principles many times to help customers, and I thought it would be a great time to pass on that knowledge, and the skills I have learnt along the way on my blog.

Click through for a video and the measures used.

Comments closed

The Siren Song of High Accuracy

Holger von Jouanne-Diedrich notes that accuracy is not in itself necessarily a good thing for a machine learning model:

In one of my most popular posts So, what is AI really? I showed that Artificial Intelligence (AI) basically boils down to autonomously learned rules, i.e. conditional statements or simply, conditionals.

In this post, I create the simplest possible classifier, called ZeroR, to show that even this classifier can achieve surprisingly high values for accuracy (i.e. the ratio of correctly predicted instances)… and why this is not necessarily a good thing, so read on!

The nuanced answer here is that with classifiers, accuracy is not in itself a great measure in the case of class imbalance. The more balanced your classes are, the more likely it is that a model with high accuracy is a good model. That’s where other measures such as specificity and sensitivity, positive & negative predictive value, etc. come into play.

Comments closed