Press "Enter" to skip to content

Category: Corruption

Capturing CHECKDB Output

Erin Stellato shows how you can track the results of those automated CHECKDB runs you’re doing:

First, you need to be running CHECKDB on a regular basis.  If you’re going to ask what you mean by regular basis, I’ll direct you to a post of Paul’s that is still relevant: Importance of running regular consistency checks.  There are a few factors to consider, but in general I recommend at least once a week.  If you’re a 9 to 5 shop and have the maintenance window to run CHECKDB daily, feel free to do that.

Erin walks us through it and also recommends checking out Ola’s scripts for integrity checks. I’d add to that Minion CheckDB.

Comments closed

CHECKDB Matters in the Cloud Too

Daniel Janik takes us through an ordeal related to CHECKDB on an Azure Managed Instance:

This is crazy! What now? Open at ticket with MSFT? This seemed the only choice and what was the root cause? Apparently in Azure Managed Instances, Microsoft will check databases for corruption and will take the database offline if detected.

When in this special offline state there’s no way to access the database and Microsoft must be contacted. You can’t set the DB in recovery mode or change it to ONLINE. Microsoft does “contact” someone to notify that the database was taken offline due to corruption but if you work at a larger company this notification may never reach the right people.

Read on to see what Daniel ended up doing and some tips on making the process smoother.

Comments closed

Corruption and Secondary Databases

Paul Randal shares some wisdom on corruption:

We’ve had a few new clients come to use recently after experiencing corruption, and they’ve been worried about whether physical corruption can propagate to secondary databases (like an availability group secondary or log shipping secondary) through the mechanism used to maintain the secondary database in sync with the primary database. I explained how it’s very rare for that to happen, but sometimes it does, and not in a way you’d think. Read on…

I don’t even have to ask you to read on; Paul has even done that. And do read the comments as well.

Comments closed

Transactions Rolling Back During CHECKDB

Paul Randal explains a long-standing bug in CHECKDB:

Continuing the database snapshot theme from the previous Curious Case post, I had another question from someone who was concerned about transactions rolling back during DBCC CHECKDB. They’d just noticed the messages in the error log saying that when DBCC CHECKDB was executed, it was causing transactions to roll back in the database – and how could that possibly be allowed to happen? They said they panicked and stopped all DBCC CHECKDB executions.

There’s no need to panic. The problem is actually a bug in the database snapshot code that’s been there since SQL Server 2005, where it reports the wrong database name.

Read on for a demo; it’s not a dangerous problem.

Comments closed

Emergency Mode in SQL Server

Paul Randal answers a reader question:

I had a blog comment question a few days ago that asked why emergency-mode repair requires the database to be in EMERGENCY mode as well as SINGLE_USER mode.

All repair operations that DBCC CHECKDB (and related commands) performs require the database to be in single-user mode so there’s a guarantee that nothing can be changing while the checks and repairs are done. But that doesn’t change the behavior of what repair does – that needs emergency mode too.

Read on for an explanation of what emergency mode is and why we need it to run CHECKDB repair operations.

Comments closed

DBCC CHECKDB Error on Azure SQL Database

Arun Sirpal explains an error message on Azure SQL Database:

msg 7928, Level 16, State 1, Line 3
The database snapshot for online checks could not be created. Either the reason is given in a previous error or one of the underlying volumes does not support sparse files or alternate streams. Attempting to get exclusive access to run checks offline.
Msg 8921, Level 16, State 3, Line 3
Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.

Read on to see what this means, as well as what it means for you.

Comments closed

So Your tempdb is Corrupt

Steve Stedman walks us through how to fix corruption in tempdb:

The fact that you know you have corruption in TempDB is good news, that shows that you are running CheckDB against TempDB and many people overlook this.

    The corrupt page in TempDB may cause some issues if it is not cleared up.

    Since the TempDB files are recreated if they don’t exist when SQL Server restarts, here is what I would recommend.

Read on for Steve’s advice.

Comments closed

Suspect Pages in msdb

Max Vernon explains what the suspect_pages table is in msdb:

When SQL Server detects corruption in a database, it reports that corruption immediately to the client who requested the data. But did you know SQL Server also stores the details about which pages have experienced corruption in the msdb database, in the suspect_pages table?

Read on to see the information you can get from this table, including a listing of what each event type means.

Comments closed

Backups With Checksum Validation

Eric Blinn shows the upside to performing checksum validation during backups:

Since a full backup reads every data page it makes sense that we can ask SQL Server to calculate and compare each checksum during this operation. Even a differential backup reads all of the pages for any extent (a group of 8 data pages) if any one page in it changes so it will validate some pages that didn’t change.

Read on for a demonstration.

Comments closed

Corrupting Databases For Fun And Profit

Eric Blinn has started a new series on database corruption.  In part one, he shows us how to corrupt a database (probably by letting it listen to Alice Cooper and Frank Zappa):

I’m going to start a series of posts regarding corruption detection, prevention, and correction.  In order to detect corruption we must first have a corrupt database.  This post is going to show how to purposely corrupt a database.  I’ll refer back to this post often as it will be the basis of most of the future posts in this series which will pick up where this leaves off.

Step 1.  Create a new database.  You didn’t think I was going to ask you to corrupt a real database did you?  We will create one table with some names in it.  The table is abnormally wide on purpose.  The goal is to get fewer rows on any one page of data so that we can get several pages of data without needing many hundreds of rows.

Part two explains the concept of page verification:

Page verification is a process in SQL Server where the engine writes extra data to the header of the page while writing it to disk.  This extra data can be used to verify that the data later read from that page is what was expected.  There are 3 options for the setting.  They are NONE, CHECKSUM, and TORN_PAGE_DETECTION.  The torn page option is deprecated.  It should not be used and will not be covered in this series.

When set to CHECKSUM SQL Server will quickly determine a checksum for the page of data and write it to the header of the data page.  Any time from that point forward when the page is read from disk SQL Server will perform the same checksum calculation and compare that to to the stored value in the header of the page.  If the value matches that would indicate there is probably not corruption. If the values do not match that would indicate there is almost certainly some kind of corruption.

Corruption is one of the scariest things that can happen to a database; knowing how to respond to an incident of corruption is critical even as the incidence of corruption is quite low in practice (unless you’re running on awful hardware).

Comments closed