Press "Enter" to skip to content

Category: Myth-Busting

Clustered Index And Physical Storage

Wayne Sheffield busts a myth:

In several of my last few blog posts, I’ve shared several methods of getting internal information from a database by using the DBCC PAGE command and utilizing the “WITH TABLERESULTS” option to be allowed to automate this process for further processing. This post will also do this, but in this case, we’ll be using it to bust a common myth—data in a clustered index is physically stored on disk in the order of the clustered index.

Busting this myth

To bust this myth, we’ll create a database, put a table with a clustered index into this database, and then we’ll add some rows in random order. Next, we will show that the rows are stored on the pages in logical order, and then we’ll take a deeper look at the page internals to see that the rows are not stored in physical order.

Read on for the proof.

Comments closed

Subqueries And Performance

Grant Fritchey busts a myth:

I’ve written before about the concept of cargo cult data professionals. They see one issue, one time, and consequently extrapolate that to all issues, all the time. It’s the best explanation I have for why someone would suggest that a sub-query is flat out wrong and will hurt performance.

Let me put a caveat up front (which I will reiterate in the conclusion, just so we’re clear), there’s nothing magically good about sub-queries just like there is nothing magically evil about sub-queries. You can absolutely write a sub-query that performs horribly, does horrible things, runs badly, and therefore absolutely screws up your system. Just as you can with any kind of query. I am addressing the bad advice that a sub-query is to be avoided because they will inherently lead to poor performance.

There are times not to use subqueries, but this post is absolutely correct:  understand the reasons why things may or may not perform well, and don’t be afraid to try things out.

Comments closed

Fallacies of Data Science

Adnan Masood and David Lazar have a list of fallacies in the world of data science:

Extrapolating beyond the range of training data, especially in the case of time series data, is fine providing the data-set is large enough.

Strong Evidence is same as a Proof! Prediction intervals and confidence intervals are the same thing, just like statistical significance and practical significance.

These are some good things to think about if you’re getting into analytics.

Comments closed

NOCOUNT Performance Differences

Aaron Bertrand looks into whether SET NOCOUNT ON provides a performance improvement:

For years, I had been operating under the impression that SET NOCOUNT ON; was a critical part of any performance strategy. This was based on observations I had made in, arguably, a different era, and that are less likely to manifest today.

Check out the comments as well.  This is an interesting conundrum as there’s a lot of ingrained knowledge that SET NOCOUNT ON is faster (and I admit that I thought I remembered it being the case when going through loops), but people have had limited success in coming up with a scenario in which it makes an appreciable difference.

Comments closed

Copy-Only Backups

Tibor Karaszi talks about copy-only backups:

If you specify COPY_ONLY for a full backup, it will not affect the following differential backups. I.e., the following differential backups will be based on  the last full backup which was not performed with COPY_ONLY. Another way of looking at this is that a full backup using COPY_ONLY will not reset the DIFF (also known as DCM) page, which is page 6 and repeated approximately every 4 GB in each database file.

If you specify COPY_ONLY for a log backup, it will not affect the following log backups. I.e., the log backup chain will be without the one(s) produced using COPY_ONLY. Another way of looking at this is that a log backup using COPY_ONLY does not empty (truncate) the log.

That’s it! COPY_ONLY for a full backup does not, in any way, affect the restoreability for log backups.

The copy-only is a great feature, but understand what it does and how it works.

Comments closed

Beware ROWLOCK Hints

Kendra Little points out that ROWLOCK hints might make blocking worse:

Note that the logical reads are the exact same and neither query is doing physical reads (the execution plans are the same– the optimizer doesn’t care what locks you are using). The queries were run with SET STATISTICS IO,TIME OFF and Execution Plans turned off, just to reduce influencing factors on duration and CPU.

The database engine is simply having to do more work here. Locking the pages in the clustered index is less work than locking each of the 1,825,433 rows.

Even though our locks are more granular, making queries run longer by taking out individual locks will typically lead to more blocking down the road.

Kendra follows up with several optimization possibilities, so read the whole thing.

Comments closed

Bad Fixes

David Alcock looks at a few common “fixes” which end up causing their own problems:

I’m seeing lots of CXPACKETS waits, how do I fix these?

Bad Advice = Set the maximum degree of parallelism to 1, no more CXPACKET waits!

I’m seeing index fragmentation occur quite frequently on some of my indexes, what should I do?

Bad Advice = Set fill factor on the server to 70, no more fragmentation problems!

I’m worried about TempDB contention. What should I do?

Bad Advice = Set the number of files to the number of cores, no more contention issues!

Read the post for better advice.

Comments closed

Extensions Are Important

Kenneth Fisher on database file extensions:

So what does this mean? Should we start naming our database files whatever we want? No, absolutely not! It means that you need to be extra careful when specifying the name of the files. You really don’t want to use non-standard file names. The confusion! The misunderstandings! Do you really want the operating system to think your database file is a really big picture? Or even worse (and while I realize this sounds like a stretch I’ve had it happen) you accidentally give your file a .bak extension. Then one day your automated process that deletes old bak files runs as the same time your instance is down. Bye bye database file.

You can use whatever extension you want, but be smart about it.  Also check out Sean McCown’s dirty fun trick.

Comments closed

Error Handling

Continuing my “classics” series, Erland Sommarskog has a three-part series (with three appendices) on error handling that will take up your entire weekend:

Why do we have error handling in our code? There are many reasons. In a forms application we validate the user input and inform the users of their mistakes. These user mistakes are anticipated errors. But we also need to handle unanticipated errors. That is, errors that occur because we overlooked something when we wrote our code. A simple strategy is to abort execution or at least revert to a point where we know that we have full control. Whatever we do, simply ignoring an unanticipated error is something we should never permit us. This can have grave consequences and cause the application to present incorrect information to the user or even worse to persist incorrect data in the database. It is also important to communicate that an error has occurred, lest that the user thinks that the operation went fine, when your code in fact performed nothing at all.

Error handling is a crucial part of development.  And given that SQL Server has…peculiarities…when it comes to error handling, I highly recommend reading this series.

Comments closed