Press "Enter" to skip to content

Category: Administration

Handling Growth in sysjobhistory

Aaron Bertrand has a lot of jobs:

In the first few days of my new role at Infios, we came across an interesting case of memory exhaustion. A whole slew of memory-related error messages would populate the errorlog, then some stack and memory dumps would appear, and then the SQL Server service would just shut itself down without warning. Some of the errors we observed (apologies, it’s a long list, but I want to make sure that any subset might land you here):

Click through to see if you have any of those issues and one possibility of what the cause might be, as well as how to deal with it.

Leave a Comment

Alerting on Checkpoint Time in Postgres

Jeremy Schneider shares some advice:

Checkpoint is the heart of your database. It’s buried deep inside. It’s not something everyone talks about, like well-tuned autovacuum or fast queries. But if checkpointer stops beating, then you’re dead.

In addition to its well-understood job of getting dirty pages written from cache to disk in the background, it also has many smaller jobs that are less widely known. Management of a few shared-memory config settings like sync_standby_names and full_page_writes. Fsync Batching. Deferred file unlinks. Enforcement of archive_timeout.

Click through to see what happens when checkpoint time starts increasing and one important thing you should not do.

Leave a Comment

The Importance of Testing Received Wisdom

Mark Wilkinson lays out an argument:

Life is full of “absolutes”. For example, the Star Trek: The Next Generation episode “The Measure of a Man” is often cited as the best episode of the series, and many folks will tell you that you should never adjust max worker threads. But once you take the time to dig in, you realize that “Darmok” is in FACT the best episode of ST:TNG, and you’ll also find a small cohort of folks adjusting max worker threads on all of their SQL Server instances. Are these people just abnoxious contrarians? No. They just did their own testing to validate the common wisdom.

Click through for an example from Mark around 64K allocation unit sizes for NTFS volumes. And I’ll give one on max worker threads. I had a consulting client at one point which had per-customer databases. Each customer was, in general, quite small, so they had thousands of databases on the instance. They also wanted high availability on the system, so they wanted each database mirrored to a different server.

If they didn’t spike max worker threads to extreme levels, the server would have fallen over simply from the weight of all of the open database mirroring connections. The actual server workload was fine and it could handle all of the open worker threads because the large majority were doing nothing. But if a zealous problem-solver popped in, ran a diagnostic, saw that they were violating “best practices,” and “fixed” the problem, that would have been a bad day.

Unrelated but similar story: the one time they did need to fail over due to an emergency, it was also a bad day. Because even if the instances can handle 2500+ databases, it turns out that having them all fail over at the same time on low-powered Azure hardware was not a pleasant experience.

Leave a Comment

Moving Away from Average Fragmentation for Index Maintenance

Tim Radney proffers some advice:

As a SQL Server DBA with years of experience tuning production environments, I’ve seen the same patterns repeat: nightly index maintenance jobs running for hours, consuming massive CPU and I/O, only for performance to remain inconsistent or even degrade in some cases. Many of us (myself included, early in my career) relied heavily on avg_fragmentation_in_percent from sys.dm_db_index_physical_stats to decide when to reorganize or rebuild indexes. It’s time we move past that outdated approach and stop just doing what we’ve always done.

One thing I would add on to Tim’s advice is, determine whether that index even needs rebuilt or if you’re burning resources for no practical benefit. If there’s no practical performance benefit from the operation—and with all-flash arrays that are within an order of magnitude of RAM speeds, you might already be close to that point—then the index rebuild is for naught. The index maintenance strategy that we all learned back in the day was to minimize the time we spent waiting for spinning disks to reach relevant data. When random access lookups are approximately the same speed as sequential lookups, position on the disk doesn’t matter all that much.

Leave a Comment

Automating Azure SQL DB Tasks without SQL Agent

Garry Bargsley solves a problem:

Many routine administrative tasks that have traditionally been handled by SQL Agent still need to be performed:

  • Scheduled stored procedures
  • ETL processes
  • Report generation
  • Data cleanup
  • Monitoring and alerting
  • Business process automation

However, Azure SQL Database does not include SQL Agent.

Garry provides several solutions, and I would add to it third-party job scheduling solutions. Granted, that’s usually an extra expense (whether due to fees or supporting a roll-your-own solution), but it’s on the table. And some of them are better than what SQL Agent has to offer, even if I do like the fact that there’s an okay option built-in for DBAs.

Leave a Comment

Goodbye, SQL ConstantCare and Consultant Toolkit

Brent Ozar breaks the news:

About ten years ago, I sketched out an idea for a different kind of SQL Server monitoring tool: one that gathered data just once per day, and gave you a short email with a specific list of actionable tasks to make a difference in health and performance.

Richie Rump did an amazing job of building SQL ConstantCare out in the years since, building a solution that was rock-solid and scaled well to terabytes of monitoring data. Every day, we sent thousands of emails for SQL Servers around the world.

This month, we shut off sales and began decommissioning it. Here’s why.

Click through for the reasons. Brent is still maintaining the First Responder Toolkit, so things like sp_Blitz will still be around. But I will miss the quarterly graphics of who’s using which versions of SQL Server, even if I always had to give the “This is a biased sample and may not be indicative of the entire population, but it’s still an interesting and informative sample” spiel each time.

1 Comment

Things to Watch with Contained Availability Groups

John Morehouse keeps one eye on things:

Contained availability groups solve some real operational problems, especially around logins, jobs, permissions, and supporting metadata. They can reduce drift between replicas and make failover cleaner.

That does not mean they are magic.

Like most SQL Server features, contained availability groups come with details that matter. The feature can absolutely help, but it needs to be designed, tested, and operated with the right expectations.

Click through for several things you should consider before jumping into deploying contained AGs.

Leave a Comment

Jobs and Security Objects in Contained Availability Groups

John Morehouse moves some assets between availability group replicas:

In the first post, I introduced contained availability groups and how they bring contained versions of master and msdb along with the Availability Group. That matters because many applications depend on more than just user databases.

Two of the biggest wins are SQL Agent jobs and security objects.

Click through to see how they work.

Leave a Comment

Deprecated SQL Server Features since 2016

Jeff Iannucci has a list:

If you’re planning on upgrading from SQL Server 2016 to the current version of SQL Server 2025, it would be helpful to note some of the features from versions since 2016 that are no longer supported. By that I’m talking about those things classified as “deprecated” (no longer supported) or “discontinued” (removed from the product). Let’s take a look at these features and say our goodbyes, organized by versions.

There’s not a lot on here that would affect most companies. Though to one of Jeff’s points in the article, I actually have seen a company use lightweight pooling, mostly because they didn’t know what it did and saw the bit about how it could improve performance. I came in and flipped that switch off for them. That was the kind of feature that likely benefited one or two large customers 30 years ago or so and it just hung around in the product due to inertia.

Leave a Comment

Thoughts on a Cloudless World

Mike Donnelly has some tongue-in-cheek responses:

There are some serious angles to this topic, and I have had conversations with people at conferences who are doing a remigration from the cloud, but it feels like the exception not the rule. It is interesting to think about. I spent most of my career working with on-prem SQL Server, but there was a period of about 10 years (the consulting years) where I didn’t touch anything that wasn’t in the cloud. The past several years have been working in a hybrid environment, but most of the work has been moving things to Azure and Fabric. Koen has some prompts for what our blog posts could be about, but rather than dive deep into any one thing I’m going to go with the blog writer’s best friend – a top 10 list.

The funny thing is, in my time as an on-premises DBA, I never dealt with hardware and didn’t have access to the server room.

Leave a Comment