Press "Enter" to skip to content

Category: Administration

Troubleshooting Performance around a Data Purge Process

Andy Mallon troubleshoots an issue:

In January, one of our Staff Engineers sent the following message to the DBRE help channel in Slack:

Morning folks, we had a pretty significant wait spike on the [database]. Circuit breakers closed and reopened quickly. Is anyone immediately aware of a reason why this could’ve happened?

Read on for Andy’s quick analysis and then the root cause and solution.

Comments closed

Forced Quorum Failures with WSFC

Eitan Blumin can’t reach quorum:

The incident started with a late-night phone call from one of our customers (it’s always a late-night phone call, isn’t it?).

They reported that during a DR exercise on their production environment (Chaos Engineering, anyone?) their entire cluster failed and they weren’t able to bring any of the replicas back online.

Click through for the full story, including what happened, why it happened, and what you can do to prevent similar problems in the future.

Comments closed

Automating Microsoft Fabric Capacity Scaling via Logic App

Soheil Bakhshi does some scaling:

In a previous post I explained how to manage the capacity costs of a Fabric F capacity (under Pay-As-You-Go pricing model) using Logic Apps to Suspend and Resume it.

A customer who read my previous blog asked me “Can we use a similar method to scale up and down before and after specific workloads?”. This blog post is to answer exactly that.

This is pretty neat, though I wonder how long it takes and how much downtime it produces.

Comments closed

Monitoring if an Azure Server Goes Offline

Paul Bergson builds an alert:

My miniature schnauzer, Raven, is a smart and lively dog who loves to hunt for rodents in the yard. She has a keen sense of smell and can detect the slightest movement of her prey. She barks loudly to alert me whenever she finds a potential target and chases after it with all her speed. However, the rodents are too cunning and often escape to a tree or a hole in the ground before she can catch them. She then returns to me with a disappointed look on her face, hoping for a treat or a pat on the head.

Azure Monitor is like Raven, but much more efficient and reliable. It can monitor your Azure servers and detect when they go offline in ~1 minute. It can also alert you via email, SMS, or webhook when something goes wrong, so you can take action to fix it. With Azure Monitor, you can stay on top of your server’s health and performance.

Read on to see how you can use Azure Monitor and build policies, with much less cleanup requirement than a dog.

Comments closed

Purging Lots of Backup History

David Wiseman needs to clear out a significant amount of backup history:

Recently, I encountered an issue running sp_delete_backuphistory on servers that hosted a large number of databases with frequent log backup & restore operations. The clean up task hadn’t been scheduled and the history tables had grown very large over several months. The msdb databases was also hosted on a volume with limited IOPs.

Attempting to run sp_delete_backuphistory under these conditions you will likely encounter these issues:

Click through for that list of issues, as well as a way of mitigating the problem. I’ve noticed this kind of pattern appears fairly often in Microsoft-provided cleanup procedures: the code works well until you reach a certain scale, at which point it falls over. It’d be great if the original sp_delete_backuphistory performed batch deletion from the get-go, but David shows us a way to get around the issue.

Comments closed

SQL Server Setup Config Files and Per-Version Maintenance

Aaron Bertrand diagnoses a problem:

We came across a new error during SQL Server setup that returned zero useful search results:

The setting ‘COMMFABRICPORT’ specified is not recognized.

I did not find too many mentions of this argument at all, never mind in that specific phrase – though now that I wrote this, it might start returning this post. Most mentions came from probably a single sample copy of ConfigurationFile.ini offered on a blog post from the ~2017 timeframe.

Read on to learn more about this, as well as short-term and long-term fixes for managing your installation config files.

Comments closed

New Features in Azure SQL MI Instance Pools

Djordje Marinkovic shows off what’s new:

When migrating small SQL Server instances to Azure it is often the case that a single SQL Managed Instance turns out to be overkill in terms of size and, consequently, cost. The oversizing problem can happen whenever very small instances are required, for example when an ISV company builds a multi-tenant app requiring a small SQL MI instance for each customer. In such cases the smallest size (4-vCores) for a single SQL MI can still turn out to be too large and too expensive for the given use case. This is where SQL MI pools (“instance pools”) deliver great value.

Click through for more information on instance pools, as well as new features for instance pools.

Comments closed

DBCC SHRINKFILE and tempdb

Tom Collins answers a question:

Question: I’m trying to delete a TempDB ndf file from the TempDB file definitions. It is no longer required ,but getting an error message :

DBCC SHRINKFILE: Page xxxxxxxx could not be moved because it is a work table page.

How can I get around this problem? There is no activity on the server

Read on for the answer.

Comments closed

The Impact of Auto-Close on Performance

Steve Stedman explains why Auto-Close should almost never be on for your database:

When the AutoClose setting is enabled, SQL Server will shut down the database after the last user disconnects. This means that every time a new connection is made, SQL Server must go through the entire process of starting the database again. This includes reading the database file, allocating memory, and performing any necessary recovery processes. This overhead can cause a noticeable delay for users as they connect, especially if the database is large or complex.

Read on for several other factors affecting performance. I will say that the best use case for Auto-Close is when you have a dev instance—especially on a local machine—with a large number of databases and a very limited amount of RAM available. Otherwise, if this is a server, I’m turning Auto-Close off. Even today, I’d rather just buy enough RAM for my developers than flip this switch.

Comments closed

Firewalls and TLS in SQL Server on Linux

I have a new video out:

In this video, we harden our SQL Server instance in two ways: by using a firewall to limit inbound traffic, and by using a certificate to force encrypted connections to SQL Server.

This was a video I enjoyed creating. It also shows the progress of SQL Server security: go back to 2005 (pre-SP1) and even SQL authentication over TDS was unencrypted by default. They fixed it so that the authentication would use a self-signed cert but the data you’d get back from query results was unencrypted. Nowadays, encryption is easy (if you’re okay with a self-signed cert) and some future version of SQL Server will make it mandatory.

Comments closed