If you know about DBCC CHECKDB then most likely you will know about DBCC CHECKTABLE. Quite simply this command performs primitive system-catalog consistency checks, per-table consistency checks on the single table specified, and cross-table consistency checks on indexed views that reference the specified table. (Page 899 Microsoft SQL Server Internals 2012, Chapter 14, Page 899, Paul Randal)
There is one concept about this command that I did not know about until recently, intrigued? Read on.
I was going to say “read on” but Arun already used that line.
I was exchanging messages with Azure Support and even though I didn’t get a concrete answer to confirm this I ended up asking the question within a Microsoft based yammer group and yes they do automatically carry out consistency checks.
This is great but it is one less thing for me to worry about and if there is serious corruption, you know potential data loss (which would be rare) then they will definitely tell you and work with you.
However, it doesn’t mean you CAN’T run it, I was curious so I ran DBCC CHECKDB on my Azure SQL Databases, but like with any other consistency check it is best to do it OFF-PEAK hours. I would probably take it a step further and wouldn’t even bother running it.
It’s an interesting post, reminding us that administering an Azure database isn’t the same as on-prem.
Looking in the SQL Error Log there were hundreds of these combinations in the minutes immediately preceding the job failure:
The operating system returned error 665(The requested operation could not be completed due to a file system limitation) to SQL Server during a write at offset 0x000048a123e000 in file ‘E:\SQL_Data\VLDB01.mdf:MSSQL_DBCC17‘. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
Error: 17053, Severity: 16, State: 1.
E:\SQL_Data\VLDB01.mdf:MSSQL_DBCC17: Operating system error 665(The requested operation could not be completed due to a file system limitation) encountered.
Read on for more information, including a rough idea of how much space the snapshot requires as well as a few workarounds and hints.
An interesting situation was discussed online recently which prompted me to write this post. A fellow MVP was seeing periodic corruption messages in his error log, but DBCC CHECKDB on all databases didn’t find any corruptions. A subsequent restart of the instance caused the problem to go away.
My diagnosis? Memory corruption. Something had corrupted a page in memory – maybe it was bad memory chips or a memory scribbler (something that writes into SQL Server’s buffer pool, like a poorly-written extended stored procedure), or maybe a SQL Server bug. Whatever it was, restarting the instance wiped the buffer pool clean, removing the corrupt page.
So why didn’t DBCC CHECKDB encounter the corrupt page?
Read on for the rest of the story.
Let me explain a few things about the script. I am getting the path of the current trace file and placing it into a variable. The current file name will almost certainly have a suffix of _nn just before the .trc extension. If I were to run the script as is I would only be reading the current log file and not the other 4 that preceded it. If all you care about is the current log file then fine but most will want to search all the existing log files. One way to do this is to simply replace the current file name with just log.trc and use default as the 2nd parameter as I did above in the fn_trace_gettable function. The default parameter value tells the function to read all files from that one onward. even though log.trc doesn’t actually exist it knows how to handle it and reads all of the existing trace files in order.
So if the string that we search on (here we use ‘dbcc free%’) is in any of the files it will return the matching rows. You may have to adjust the wildcards and such but I think you get the idea. Again remember that the data is transient so always look at the StartTime column in the logs to ensure you know which Date and Time range you are looking at. You can do something like this but I will leave that up to you.
SELECT MIN(StartTime) AS [Begin], MAX(StartTime) AS [End] FROM ::fn_trace_gettable(@Path,default)
A word of caution in that I never bothered to see just how resource intensive this function is. while I don’t expect any issues with normal use it is not something you want to be searching on every second. Be sensible and you should have no problems.
Click through for more details, including the script Andy uses to do this search.
If you’ve been using DBCC CLONEDATABASE at all, you might have run into a cannot insert duplicate key error (or something similar) when trying to clone a database:
Database cloning for ‘YourDatabase’ has started with target as ‘COPY_YourDatabase’.
Msg 2601, Level 14, State 1, Line 1
Cannot insert duplicate key row in object ‘sys.sysschobjs’ with unique index ‘clst’. The duplicate key value is (1977058079).
If you do some searching, you’ll probably end up at this Connect item: DBCC DATABASECLONE fails on sys.sysowners.
The Connect item states that the problem exists because of user objects in model. That’s not the case here.
I’m working with a database created in SQL Server 2000…now running on SQL Server 2016.
This isn’t very likely to pop up for most places (I hope!), but it’s good to know.
I’ve been lucky with database corruption during my career. I could probably count on one hand the number of times I’ve had to deal with it. A couple times, it was in a customer’s environment–they managed it themselves, but called me in to help. The other incidents were ones I inherited from a backup I had to restore into a production environment. The first time it happened to me, I didn’t realize it until days later when DBCC CHECKDB ran during a weekend maintenance window. After that, I added a new “rule” to my list: always run DBCC CHECKDB after restoring a database from someone else. That rule paid dividends today.
Here’s the output I saw:Msg 8914, Level 16, State 1, Line 50 Incorrect PFS free space information for page (1:2564368) in object ID 457768688, index ID 1, partition ID 72057619124060160, alloc unit ID 72057594116767744 (type LOB data). Expected value 0_PCT_FULL, actual value 100_PCT_FULL. CHECKDB found 0 allocation errors and 1 consistency errors in table 'tbl_Redacted' (object ID 457768688). CHECKDB found 0 allocation errors and 1 consistency errors in database 'db_redacted'. repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (db_redacted).
Read on to see how Dave solved this issue.
Lonny Niederstadt has dropped a boatload of information on us, with a four-part series looking at the OLEDB wait type when running DBCC CHECKTABLE. Part 1 sets up the problem:
So… yeah. Definitely a relationship between the number of pages in the heap and the number of OLEDB waits. Something(s) increase the number of OLEDB waits beyond the number of data pages, the number of used pages and the number of reserved pages in a way that I haven’t deciphered yet, though.
On this very simple heap, with nearly the same number of rows as pages in the heap, the number of OLEDB wait events is nearly the same – only increased by one – when “upgrading” the checktable operation from physical_only to a checktable with logical checks.
So how do the results match with expectations? Whoa!! 25745 OLEDB wait events as the starting point for MAXDOP 1 and 2. Then single-stepping up to 25752 OLEDB wait events for MAXDOP 8. 25752 is exactly 4 times the used page count of 6438 for the clustered index. So the OLEDB wait events went from a small number more than the count of pages/rows with the heap… to almost 4 times as many with the clustered index. *That* was a bit unexpected.
All right. What if we use a heap with a nonclustered include index? Based on the table and index definition below, I expect the nonclustered include index nci_OLEDB_TEST_HEAP_NCI_INCLUDE to be remarkably similar to the clustered index we previously used.
Interesting reading so far in this series.
Lonny Niederstadt has a couple of posts on scaling DBCC CHECKTABLE based on degree of parallelism. First, he looks at running the command with physical_only:
So we can use this formula when dop, elapsed_ms, and cpu_ms are known:
DOP * elapsed_ms = cpu_ms + idle_ms
That allows the 8 checktable operations to be summarized in this graph. From DOP1 to DOP 8 the cpu_ms of the operation is extremely steady. From DOP 1 to DOP 4 there are significant decreases in elapsed time as dop increases. After dop 4, reduction in elapsed time is slight. Throughout the tested range, idle_ms increased at a nearly linear rate.
In his second post, he looks at full CHECKTABLE runs and not just physical_only:
So the good news for today is that checktable operations without the physical_only option scale farther/better on my test tables than checktable with physical_only. While with physical_only scaling benefits in elapsed time are primarily seen only to dop 4, without the physical_only option elapsed time benefits to increasing dop extend at least to dop 8.And we saw that the shape of scalability graphs is pretty volatile 🙂 That’s largely because modest changes in elapsed time are multiplied by dop in this calculation to arrive at the idle_ms number – that idle_ms number is the one that changes shape most readily.
These are prologue posts to a discussion on the OLEDB wait type.
Let’s start with the safety convention. The “null” of a null pointer isn’t a magic value, but in real-life implementation is simply zero, which is a perfectly valid virtual address. However, on the premise that trying to access address zero or addresses near it probably indicates a program error, the OS will map that page in such a way that trying to access it causes an access violation. This is not a bug or an accident, but a damn clever feature! Robert Love explains it very nicely over here for Linux, and it applies equally to Windows.
Now recall the convention that trying to retrieve the head or tail of an empty list will – by convention – bring you back a null pointer. When iterating, a related convention may also return a zero when you’ve gone all the way around and come back to the list head. Clearly the onus is on the developer to recognise that null pointer and not dereference it, but attempting to do so sets in motion the safety feature of an access violation, which can then be neatly caught through standard exception handling, for instance yielding a diagnostic stack dump.
Very interesting article, and also a good juxtaposition of supported, “production-safe” code versus undocumented processes.