Msg 824, Level 24, State 2, Line 1
SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:2806320; actual 0:0). It occurred during a read of page (1:xxxxx) in database ID 5 at offset 0x00000xxxxx0000 in file ‘C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\YourDatabaseName.mdf’. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
Since this is one of those things that I regularly work with, I thought I would see what other people are saying about this error message, and boy oh boy did I found some crazy and outright damaging suggestions
Steve puts together a bunch of really bad advice and explains why you shouldn’t follow it. Read the whole thing and listen to Steve’s advice, not the bad advice.
One can make the argument that DBCC DROPCLEANBUFFERS might not be particularly valuable for testing. First, the storage engine in SQL Server Enterprise Edition (or Developer Edition, which is often used when testing) behaves differently with a cold cache versus a warm one. With a warm cache, a page not already in cache (e.g. index seek by primary key) will be fetched from disk using a single 8K page IO request as one expects. However, when the cache isn’t fully warmed up (Buffer Manager’s Target Pages not yet met), the entire 64K extent (8 contiguous 8K pages) is read for the single page request regardless of whether the adjacent pages are actually needed by the query. This has the benefit of warming the cache much more quickly than would otherwise occur, but given that the normal steady state of a production SQL Server is a warm cache, testing with a cold cache isn’t a fair comparison of different plans. More data than normal will be transferred from storage so timings may not be indicative of actual performance.
I don’t think I agree 100% with that argument, but I am sympathetic to it. Still, Dan has great advice in this post.
Attempting to debug production performance problems in your development environment can be problematic in many ways, leading to a frustrating troubleshooting experience. One very common situation is the resources on the development environment are substantially less robust than on the production system; for instance prod has 128 GB of RAM, while dev only has 16 GB, prod has 16 cores, while dev only has 4 cores. Unintuitively, this disparity can result in queries running faster in development than in production.
SQL Server has a little-known (and undocumented and unsupported) troubleshooting-related DBCC command that can be used to mimic production resource levels in your development environment. As with all undocumented features, do not try this in production.
Read on to learn how
DBCC OPTIMIZER_WHATIF can lead the optimizer to choose different plans. I almost never use this command, but it is helpful to have it in your back pocket.
Jeff Mlakar has started a series on database corruption. His first post involves finding corruption:
The cause lies in layers below SQL Server. The most common are hardware faults; in particular, issues with the I/O Subsystem. Any component in the I/O Subsystem can fail and be the cause of database corruption: disks, controllers, CPU, memory, network switch, network cables, SAN, etc.
Database corruption cannot entirely be prevented. It is not a matter of if but rather when
Disks go bad. So do NICs, cables, routers, and everything else physical below the SQL Server Instance. This is why it is important to know that we cannot entirely prevent corruption – only deal with and mitigate it.
Click through for a few ways to find potential corruption.
You have a DBCC CHECKDB script running, something like the following, and it may take several hours to run to confirm if there is any corruption in your SQL Server Database.
DBCC CHECKDB (
Then someone asks you the age old question… When will it be done?
Click through for a quick script and the answer.
Recently I was doing some work with a friend around some database restores. It was pretty routine stuff. However, after one restore my friend came across something in the SQL Error Log that caught him by surprise. As part of the restore, there was a CHECKDB message for the restored database:
My friend’s first reaction was “why is SQL Server doing a DBCC CHECKDB as part of the restore?” He was concerned, because CHECKDB is a pretty hefty operation and this could really impact the restore time if he had to wait on a CHECKDB to complete. But the other confusing thing was that the date for the CHECKDB didn’t match up with the restore timing.
Click through to learn the answer.
Whenever I’m teaching and recovering from corruption, I always stress that if the REPAIR_ALLOW_DATA_LOSS option of DBCC CHECKDB (which I’ll just call ‘repair’ from now on) has to be used, you should do the following:
- Run another DBCC CHECKDB to make sure that repair fixed everything and no new problems were exposed (by the repair fixing something that was blocking DBCC CHECKDB‘s access to a portion of the database)
- Run DBCC CHECKCONSTRAINTS on any tables or indexes that were repaired and are involved in constraints (stated in Books Online too)
- Reinitialize any replication subscriptions that are based on tables that were repaired (stated in Books Online too)
- Reevaluate your HA/DR strategy so you don’t have to use repair again in future
One question I’m often asked is about why replication can be broken by running repair.
Read on for the answer.
Running this statement interactively doesn’t return any data – it just loads the data into DBADatabase.dbo.DBCCMemoryStatus. Running the commented-out SELECT at the bottom of the script as written will query that table for all rows of counter VM Reserved (virtual memory reserved) but there is much more data than that available if you modify the SELECT.
This query can be dropped into a SQL Agent job step as is and it will run – just like the interactive run it will create the database and permanent table if they don’t exist and then store those nuggets of data into the permanent table for later use – you never know when you may need them!
Click through for the script.
Wednesday I walk into the office and immediately hear that CHECKDB is the source of issues on one of the servers and is the reason behind some errors that have been happening. While I don’t think this is the case (it might look like it on the surface but there is something else that is happening that is the actual cause) I also wanted to find out what CHECKDB was running at the time the errors occurred.
I needed information on when CHECKDB ran for each database. When you look for what you can run to find when CHECKDB was last run you find this blog post and also this blog post on grabbing this info. While these were very informative, they were for one database at a time. I need this for all the databases so I can try to not only find out when each one ran, but also use these time stamps to figure out the duration.
The big recommendation I’d make with regard to this is not to use sp_msforeachdb. Otherwise, click through for a good script.
DBCC CHECKDB has the ability to perform parallel checking of objects. However, it absolutely depends on the edition of SQL Server, it only happens when using enterprise edition.
Let’s see this in action. I propose the following tests for this blog post:
- Test on a SQL Server Enterprise Edition.
- Test on a non-enterprise edition of SQL Server.
I don’t have 2017 Enterprise at hand but I do have 2014 Enterprise and Express handy so it makes sense to use these versions for my “experiment”.
Read on to see the difference.