Press "Enter" to skip to content

Curated SQL Posts

When Identity Columns Skip Values

Kevin Hill explains why you might see SQL Server skip 1000 values in an identity column:

Video shows a walk-through of before and after each fix, plus a “Two guys walk into a bar” joke when I disappeared to troubleshoot a broken demo…

I’d probably avoid the fix and live with gaps. You also get gaps when you roll back an operation which inserted into an identity column, or if you have merge replication enabled on a table keyed by an identity column and SQL Server bumps the range on you. All of these are normal and good reasons not to expect contiguous numbering.

1 Comment

Using Windows Subsystem for Linux 2 in Windows 10

Max Trinidad is excited about Windows Subsystem for Linux 2:

First, I love WSL (Windows Subsystem for Linux)! It’s a great addition to Windows 10, and everyone should learn how to use it.

To get started, follow the instructions on how to get your WSL 1 Linux Distro installed. And, begin with installing Ubuntu 18.04.

Now, get Docker Desktop (), which can be installed in Windows 10 RTM Build 18363 with WSL 1.

One of the key benefits around WSL 2 is that your Docker containers will run natively rather than through a VM. That’s a pretty big deal in terms of performance and production-readiness. That Docker capability is currently in preview, but I’d expect it to make its way to production sooner than later.

Comments closed

Azure SQL Hyperscale Auto-Scaling

Davide Mauri explains how automatically to scale Azure SQL Hyperscale:

Azure SQL Hyperscale is the latest architectural evolution of Azure SQL, that has been natively designed to take advantage of the cloud. One of the main key features of this new architecture is the complete separation of Compute Nodes and Storage Nodes. This allow for independent scale of each service, making Hyperscale more flexible and elastic.

In this article I will describe how it is possible to implement a solution to automatically scale your Azure SQL Hyperscale database up or down, to dynamically and automatically adapt to different workload levels without the requiring manual .

Davide has some test measures of how much downtime you see and give you a couple thoughts on how you can track when it’s time to scale up or down.

Comments closed

Tracking Query Store Changes

Erin Stellato shows how to watch for Query Store changes whether due to settings modifications or running out of space:

The Query Store feature is a bit unique in that its status can change without user interference, which means it is important to understand how changes to Query Store are logged.  A couple weeks ago John Deardurff posted a question on Twitter asking specifically whether the SQL Server ERRORLOG is written to when the OPERATION_MODE changes to READ_ONLY because  MAX_STORAGE_SIZE_MB is exceeded.  I had never tested to confirm, but I know there is an event in Extended Events that will fire when the limit is reached.  I also know that when a user makes a change to a Query Store setting, it is logged in the ERRORLOG.

Click through to see how to watch for this and what the changes look like.

Comments closed

Audio Analysis in R

Jeroen Ooms walks us through some audio analysis with R and the av package:

The latest version of the rOpenSci av package includes some useful new tools for working with audio data. We have added functions for reading, cutting, converting, transforming, and plotting audio data in any popular audio / video format (mp3, mkv, aac, etc).

The functionality can either be used by itself, or to prepare audio data for further analysis in R using other packages. We hope this clears an important hurdle to use R for research on speech, music, and whale mating calls.

One of the most interesting things I saw Edward Tufte demonstrate was visualizing music using the Music Animation Machine. There’s a lot of space here to experiment. H/T R-Bloggers.

Comments closed

Machine Learning through Counterfactuals

Amit Sharma announces a new library:

Consider a person who applies for a loan with a financial company, but their application is rejected by a machine learning algorithm used to determine who receives a loan from the company. How would you explain the decision made by the algorithm to this person? One option is to provide them with a list of features that contributed to the algorithm’s decision, such as income and credit score. Many of the current explanation methods provide this information by either analyzing the algorithm’s properties or approximating it with a simpler, interpretable model.

However, these explanations do not help this person decide what to do next to increase their chances of getting the loan in the future. In particular, changing the most important features for prediction may not actually change the decision, and in some cases, important features may be impossible to change, such as age. A similar argument applies when algorithms are used to support decision-makers in scenarios such as screening job applicants, deciding health insurance, or disbursing government aid.

This has the potential to be a great library. One of the issues with machine learning as it stands today is that you can get an answer, but to understand how to change the answer requires having a human understand the model. This looks like a good first step. It’s only available in Python.

Comments closed

Splitting Strings with T-SQL

Andy Levy recommends using STRING_SPLIT():

Last year, you finally retired the last of your SQL Server 2008R2 instances. Congratulations! But are you taking advantage of everything that your new instances have to offer? Unless you did a review of all of the T-SQL in your applications, I’m guessing not.

At one time or another, we all find ourselves having to do some string parsing, especially splitting strings on a delimiter. Nearly all of us have one (or two or a dozen) functions for doing this somewhere on every instance of SQL Server. But since SQL Server 2016, we’ve had an official way to do it – the STRING_SPLIT() function.

Andy’s example involves splitting strings, but there are plenty of functions which come into the T-SQL lexicon. It might be worth doing a quick review of the available system functions just to see if there’s something useful which slipped with a newer version of SQL Server.

Comments closed

Finding Foreign Key Relationships in SQL Server

John Morehouse shows how you can piece together foreign key relationships in SQL Server:

Recently, I had to purge some parent records from a table.  In this case, the parent table had foreign keys, which itself isn’t an issue.  The fact that there were more than 30 of them was.   While SQL Server will happily tell you that you are violating a foreign key if a child record is present when deleting the parent record, finding all of them can be cumbersome.  This is even more true when you have a larger number of foreign keys.

Thankfully, SQL Server can tell us a lot of information about foreign keys including both the parent and child tables as well as the column used.  From this information, we can dynamically create a SELECT statement that would tell us the number of child records that are tied to the parent ID.

Click through for the solution.

Comments closed

Burn the Database Down

Jana Sattainathan has a script to drop almost all user objects in a database:

I hope this script does not become infamous for the wrong reasons! Please use caution.

I had to help a team recreate everything in a database and test their scripts but leave the roles and role grants in place. Basically, this meant that I could have scripted out the permissions and recreated the database but I thought it would be easier and more re-runnable to just drop everything else except the permissions.

Caution here means making sure you have good backups beforehand, ensuring that you pick the right database, and double-checking everything.

Comments closed

SQL Server and Query Costs

Jared Poche explains some of the ideas behind the costing algorithm in SQL Server:

One thing to remember is that cost in SQL Server is always an estimate. This is a number SQL Server calculates when considering multiple potential plans to determine which would be the best. But the number of rows it expects a given operation to return or how many times that operation runs can be off. All of that is based on statistics.

It doesn’t then go back and update the cost number later if those numbers were incorrect. So while we can use the cost as an indicator of which query or operator we should focus on, don’t completely tunnel-vision that one thing.

This kind of cost mismatch allows something to look awful on an execution plan but not actually be a problem, or (in the case of most user-defined functions prior to SQL Server 2019) vice versa.

Comments closed