With Halloween around the corner what better topic to discuss than phantom reads. A phantom read occurs when rows have been inserted after a read operation and becomes visible in a follow-up read operation within the same transaction.
I will show you what this looks like with an example. Please note the code will not work for you because I dropped the foreign key constraint within the person.person table just so I could show you the example.
Arun mentions that the serializable isolation level prevents phantom reads. So do the repeatable read and snapshot isolation levels.
The statement “There is zero logging when DURABILITY=SCHEMA_ONLY” is not factually correct, its more like a minimally logged operation. What is surprising is the fact that logged as advertised for the in-memory engine should result in far fewer log records than the equivalent workload for the legacy engine, clearly this is not the case in this particular example and something I need to dig into somewhat deeper. Also note that the version of SQL Server being used is SQL Server 2016 SP1 CU3, which should be stable. One final point, in order to make sure that fn_dblog and fn_dblog_xtp produced clean results for me each time, I took the quick and dirty option of re-creating my test database each time.
This post definitely ranks in the “Microsoft did this right” category.
What exactly is a distributed transaction? It’s one where the work needs to be completed in more than one database so data is kept in sync everywhere. For example, if you need to update data in Database A and in Database B, and they need to be kept in sync, that’s a distributed transaction. Database A and Database B can be in the same SQL Server instance, or they could be in different instances … or even in different data sources, such as Oracle or DB2. This whole shebang is often referred to as a cross-database transaction. DTC is based on the principle of a two phase commit – for the whole thing to get done, all the little bits need to be committed everywhere before claiming complete success. Otherwise stuff needs to be rolled back so that things stay in sync and all is right in the world.
Read on for Allan’s thoughts and guidance.
00.000 – Session A sets its isolation level to snapshot
00.001 – Session A explicitly begins a transaction with BEGIN TRAN
00.002 – Session A starts a WAITFOR command for 15 seconds
10.000 – Before the WAITFOR completes, Session B inserts rows into dbo.Table
15.001 – Session A starts a SELECT from dbo.Table, which returns the rows that Session B inserted
This seems wrong, because many of us commonly say things like, “in Snapshot Isolation level, all statements see data consistent with the beginning of the transaction.”
But in this case, Session B inserted the rows after Session A began its transaction using Snapshot Isolation level. So why did Session A see those rows?
Kendra explains the nuance well, so read the whole thing.
The other day I had to update some records, in Production. I’m a firm believer of using explicit transactions and double checking things before committing a transaction. This helps ensure things go as expected. This also allows me a way to rollback the changes if they don’t. It happens.
However, this means that I have to COMMIT said explicit transaction. And not go to lunch without doing so.
Can you see my mistake? I bet you can.
Fortunately, it sounds like it wasn’t a critical problem. If you want to check for open transactions, Jack Vamvas has a couple methods.
When SQL Server begins backing up data pages, it also starts keeping track of transactions, via the transaction log. After it has backed up the last data page, it then also backs up all of the transactions that occurred during the data backup. Upon restore, it will then roll those transactions forward or backward, as necessary, to ensure a consistent image is restored.
In our librarian metaphor, she would keep an activity log, which would include the changes to books A and D from the first update, then also the changes to D, X, Y, and Z from the second update. She would not “fix” the data within the backup, but simply store those update details along with her mashed-up copy. In the unlikely event she had to recreate the books (ie, a restore), then she would go back and spend the effort to piece it back together. During that restore process, she would look at the first transaction and see that her copy of Book A in her backup was too old, but Book D already had the update, and she would roll forward the update to Book A. Next, she would process the second update and see that Books X, Y, and Z had the updates, but D still needed this second update, and she would roll forward that second update to Book D. At this point, she would have successfully reconstructed an image that is consistent to the time the backup completed.
Great metaphor to describe consistency during backups.
Then we can search in the transaction log, using the fn_dblog function, for LOP_BEGIN_XACT log records from before the crash point that have a matching LOP_ABORT_XACT log record after the crash point, and with the same transaction ID. This is easy because for LOP_BEGIN_XACT log records, there’s a Begin Time column, and for LOP_ABORT_XACT log records (and, incidentally, for LOP_COMMIT_XACT log records), there’s an End Time column in the TVF output.
And there’s a trick you need to use: to get the fn_dblog function to read log records from before the log clears (by the checkpoints that crash recovery does, in the simple recovery model, or by log backups, in other recovery models), you need to enable trace flag 2537. Now, if do all this too long after crash recovery runs, the log may have overwritten itself and so you won’t be able to get the info you need, but if you’re taking log backups, you could restore a copy of the database to the point just after crash recovery has finished, and then do the investigation.
Read on for the code, as well as a test.
So the procedure was complicated and it used explicit transactions, but I couldn’t find any TRY/CATCH blocks anywhere! What I needed was a stack trace, but for T-SQL. People don’t talk about T-SQL stack traces very often. Probably because they don’t program like this in T-SQL. We can’t get a T-SQL stack trace from the SQLException (the error given to the client), so we have to get it from the server.
Michael shows how to get stack trace information and provides some advice on the process (mostly, “don’t do what we did”).
Identity columns are a very commonly used feature within tables in SQL Server. Basically when specified as an identity a column will automatically increment by the specified value; so if we have an identity increment of 1 and insert 5 rows they will automatically be numbered 1 to 5.One cautionary measure with identities is that they don’t reset themselves when rows are deleted. If we delete rows 4 and 5 the next row will still be populated as identity 6. That’s fine, but what happens if we rollback an insert.
Read on for the answer.
SQL Server is a highly efficient transaction processing platform and nearly every single operation performed by it, is usually first performed within memory. When operations are performed within memory, the need to touch physical resources (such as physical disk IOPS) are also reduced, and reducing the need to touch physical resources means those physical boundaries (and their limitations) have less impact to the overall system performance. Cool right?!
Click through to read more about how log buffers work and why they help improve SQL Server’s performance.