Press "Enter" to skip to content

Category: Corruption

Emergency Mode in SQL Server

Paul Randal answers a reader question:

I had a blog comment question a few days ago that asked why emergency-mode repair requires the database to be in EMERGENCY mode as well as SINGLE_USER mode.

All repair operations that DBCC CHECKDB (and related commands) performs require the database to be in single-user mode so there’s a guarantee that nothing can be changing while the checks and repairs are done. But that doesn’t change the behavior of what repair does – that needs emergency mode too.

Read on for an explanation of what emergency mode is and why we need it to run CHECKDB repair operations.

Comments closed

DBCC CHECKDB Error on Azure SQL Database

Arun Sirpal explains an error message on Azure SQL Database:

msg 7928, Level 16, State 1, Line 3
The database snapshot for online checks could not be created. Either the reason is given in a previous error or one of the underlying volumes does not support sparse files or alternate streams. Attempting to get exclusive access to run checks offline.
Msg 8921, Level 16, State 3, Line 3
Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.

Read on to see what this means, as well as what it means for you.

Comments closed

So Your tempdb is Corrupt

Steve Stedman walks us through how to fix corruption in tempdb:

The fact that you know you have corruption in TempDB is good news, that shows that you are running CheckDB against TempDB and many people overlook this.

    The corrupt page in TempDB may cause some issues if it is not cleared up.

    Since the TempDB files are recreated if they don’t exist when SQL Server restarts, here is what I would recommend.

Read on for Steve’s advice.

Comments closed

Suspect Pages in msdb

Max Vernon explains what the suspect_pages table is in msdb:

When SQL Server detects corruption in a database, it reports that corruption immediately to the client who requested the data. But did you know SQL Server also stores the details about which pages have experienced corruption in the msdb database, in the suspect_pages table?

Read on to see the information you can get from this table, including a listing of what each event type means.

Comments closed

Backups With Checksum Validation

Eric Blinn shows the upside to performing checksum validation during backups:

Since a full backup reads every data page it makes sense that we can ask SQL Server to calculate and compare each checksum during this operation. Even a differential backup reads all of the pages for any extent (a group of 8 data pages) if any one page in it changes so it will validate some pages that didn’t change.

Read on for a demonstration.

Comments closed

Corrupting Databases For Fun And Profit

Eric Blinn has started a new series on database corruption.  In part one, he shows us how to corrupt a database (probably by letting it listen to Alice Cooper and Frank Zappa):

I’m going to start a series of posts regarding corruption detection, prevention, and correction.  In order to detect corruption we must first have a corrupt database.  This post is going to show how to purposely corrupt a database.  I’ll refer back to this post often as it will be the basis of most of the future posts in this series which will pick up where this leaves off.

Step 1.  Create a new database.  You didn’t think I was going to ask you to corrupt a real database did you?  We will create one table with some names in it.  The table is abnormally wide on purpose.  The goal is to get fewer rows on any one page of data so that we can get several pages of data without needing many hundreds of rows.

Part two explains the concept of page verification:

Page verification is a process in SQL Server where the engine writes extra data to the header of the page while writing it to disk.  This extra data can be used to verify that the data later read from that page is what was expected.  There are 3 options for the setting.  They are NONE, CHECKSUM, and TORN_PAGE_DETECTION.  The torn page option is deprecated.  It should not be used and will not be covered in this series.

When set to CHECKSUM SQL Server will quickly determine a checksum for the page of data and write it to the header of the data page.  Any time from that point forward when the page is read from disk SQL Server will perform the same checksum calculation and compare that to to the stored value in the header of the page.  If the value matches that would indicate there is probably not corruption. If the values do not match that would indicate there is almost certainly some kind of corruption.

Corruption is one of the scariest things that can happen to a database; knowing how to respond to an incident of corruption is critical even as the incidence of corruption is quite low in practice (unless you’re running on awful hardware).

Comments closed

PFS Corruption When Moving From SQL Server 2014

Paul Randal notes a bug in SQL Server 2014:

I’m seeing reports from a few people of DBCC CHECKDB reporting PFS corruption after an upgrade from SQL Server 2014 to SQL Server 2016 or later. The symptoms are that you run DBCC CHECKDB after the upgrade and get output similar to this:

Msg 8948, Level 16, State 6, Line 5
Database error: Page (3:3863) is marked with the wrong type in PFS page (1:1). PFS status 0x40 expected 0x60.
Msg 8948, Level 16, State 6, Line 5
Database error: Page (3:3864) is marked with the wrong type in PFS page (1:1). PFS status 0x40 expected 0x60.
CHECKDB found 2 allocation errors and 0 consistency errors not associated with any single object.
CHECKDB found 2 allocation errors and 0 consistency errors in database 'MyProdDB'.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (MyProdDB).

I’ve discussed with the SQL Server team and this is a known bug in SQL Server 2014.

Read on for the fix and additional good advice.

Comments closed

Dealing With CheckDB Error Message 824 Level 24

Steve Stedman has a post on fixing a database which has experienced an incorrect pageid error:

Msg 824, Level 24, State 2, Line 1

SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:2806320; actual 0:0).  It occurred during a read of page (1:xxxxx) in database ID 5 at offset 0x00000xxxxx0000 in file ‘C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\YourDatabaseName.mdf’.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Since this is one of those things that I regularly work with, I thought I would see what other people are saying about this error message, and boy oh boy did I found some crazy and outright damaging suggestions

Steve puts together a bunch of really bad advice and explains why you shouldn’t follow it.  Read the whole thing and listen to Steve’s advice, not the bad advice.

Comments closed

So You’ve Got Database Corruption—Now What?

Jeff Mlakar walks us through troubleshooting an instance of database corruption:

Before we dive into fixing database corruption it is important to know what not to do. There are things we can do which will make a bad situation even worse.

  1. Don’t Panic – most sysadmins are used to having their cages rattled and can keep cool under duress. Jumping to action without a plan is not wise. Now is not the time to start trying things and performing thoughtless actions.

  2. Do NOT Detach the Database – we may never get it back again as it may be in a recovery pending state.

  3. No restarting SQL Services – databases may never get started up again just like the above.

  4. Don’t Reboot the Machine – same as above 2 points

  5. Don’t Start by Trying to Repair the Corruption – root cause analysis is critical to preventative measures

Jeff then walks us through things that he does to discern the root cause and correct the issue (if possible).

Comments closed

Diagnosing Database Corruption

Jeff Mlakar has started a series on database corruption.  His first post involves finding corruption:

The cause lies in layers below SQL Server. The most common are hardware faults; in particular, issues with the I/O Subsystem. Any component in the I/O Subsystem can fail and be the cause of database corruption: disks, controllers, CPU, memory, network switch, network cables, SAN, etc.

Database corruption cannot entirely be prevented. It is not a matter of if but rather when

Disks go bad. So do NICs, cables, routers, and everything else physical below the SQL Server Instance. This is why it is important to know that we cannot entirely prevent corruption – only deal with and mitigate it.

Click through for a few ways to find potential corruption.

Comments closed