Press "Enter" to skip to content

Category: Myth-Busting

The Myth of the DBA-Free Cloud

Matthew McGiffen lays out an explanation:

I was chatting with a cloud consultant who was advising on a large scale migration to AWS. He told me that one of the advantages of going for a PaaS offering (Platform as a Service) was that DBAs were no longer required as backups and restores were handled for you. PaaS services for SQL Server include AWS RDS and Azure SQL Database or Azure SQL Managed Instance.

I found it quite a funny conversation, partly as I don’t think he realised being a DBA was part of my job role, but also because I don’t know a single DBA who spends a significant amount of their time doing backups and restores.

I still remember (through others—I wasn’t in this space yet) the advertising campaign that SQL Server 2005 would completely eliminate the need for a DBA because everything would just work on its own, even sweet database tuning using the Database Tuning Advisor. The same thing applies today: even those DBA-free databases eventually need somebody to optimize them along various dimensions, ensure they are running smoothly, and correct issues if they are not. Perhaps we could call this role the Administrator of a Database or AoD, so as not to scare the DBA-free database vendors. “No, we don’t have DBAs—we just need you to have a few AoDs on staff.”

Comments closed

April Tools Day

Erin Stellato dispels some myths:

Myth #1 Azure Data Studio is the only standalone solution now that SSMS is deprecated.

SQL Server Management Studio (SSMS) is not deprecated.  We thought about writing that in ALL CAPS, but figured bold is sufficient.  SSMS has not been deprecated, and we are not planning on deprecating it.  You will see new functionality being added to Azure Data Studio, but we have a fair number of things lined up for SSMS, including migration to the Visual Studio 2022 shell, which brings 64-bit support.

Bold plus all caps might have been a bit too much, yeah.

Click through to see what’s happening in the world of SQL Server tooling from Microsoft.

Comments closed

CTEs Don’t Control Plan Shape

Erik Darling dispels a myth:

I’ve heard many times incorrectly over the years that CTEs somehow materialize data.

But a new one to me was that CTEs execute procedurally, and you could use that to influence plan shapes by always doing certain things first.

Unfortunately, that’s not true of them either, even when you use TOP.

Read the whole thing. Though I do chain common table expressions for readability’s sake, but that’s usually because I’m performing a series of repetitive calculations that I can’t simplify via APPLY.

Comments closed

SELECT * Doesn’t Keep Views up to Date

Reitse Eskens busts a myth:

Last week i read a comment or a blog somewhere (i can’t remember it anymore but please let me know so i can credit!), that a select * in a view gets a full definition under the covers and doesn’t get updated when the underlying table is updated.

So, i decided to take that for a test, see what i can reproduce.

Spoilers: it doesn’t. Click through for the proof, as well as what does update a view’s definition.

Comments closed

Accidentally Building a Population Graph

Neil Saunders shares an example of a newspaper headline which ultimately just shows us population sizes:

Some poking around in the NSW Transport Open Data portal reveals how many people enter every Sydney train station on a “typical” day in 2016, 2017 and 2018. We could manipulate those numbers in various ways to estimate total, unique passengers for FY 2017-18 but I’m going to argue that the value as-is serves as a proxy variable for “station busyness”.

When working with spatial data cases, it’s important to differentiate an effect you see because it’s actually unique or interesting versus an effect you see because that’s where all of the people are.

Comments closed

When Differential Backups Grow Larger Than Fulls

Kenneth Fisher notes that differential backups can end up being larger than full backups of the same database:

The thing about DBA Myths is that they are generally widespread and widely believed. At least I believed this one until I posted What’s a differential Backup?and Randolph West (b/t) pointed out that my belief that differential backups can’t get larger than full backups was incorrect. In fact, differential backups (like FULL backups) contain enough transaction log information to cover transactions that occur while the backup is taking place. So if the amount of data that needs to be backed up combined with transactions requires more space than just the data ….

Read on for a demonstration.

Comments closed

Area Under The ROC Is Not Accuracy

Stephen Chen debunks bad journalistic summaries of a Google research paper:

Journalists latched onto Google’s NN 0.95 score vs. the comparison 0.86 (see EWS Strawman below), as the accuracy of determining mortality. However the actual metric the researchers used is AUROC (Area Under Receiver Operating Characteristic Curve) and not a measure of predictive accuracy that indexes the difference between the predicted vs. actual like RMSE (Root Mean Squared Error) or MAPE (Mean Absolute Percentage Error). Some articles even erroneously try to explain the 0.95 as the odds ratio.

Just as the concept of significance has different meanings to statisticians and laypersons, AUROC as a measure of model accuracy does not mean the probability of Google’s NN predicting mortality accurately as journalists/laypersons have taken it to mean. The ROC (see sample above) is a plot of a model’s False Positive Rate (i.e. predicting mortality where there is none) vs. the True Positive Rate (i.e. correctly predicting mortality). A larger area under the curve (AUROC) means the model produces less False Positives, not the certainty of mortality as journalists erroneously suggest.

The researchers themselves made no claim to soothsayer abilities, what they said in the paper was:

… (their) deep learning model would fire half the number of alerts of a traditional predictive model, resulting in many fewer false positives.

It’s an interesting article and a reminder of the importance of terminological precision (something I personally am not particularly good at).

Comments closed

There Is No Easy Button With Predictive Analytics

Scott Mutchler dispels some myths:

There are a couple of myths that I see more an more these days.  Like many myths they seem plausible on the surface but experienced data scientist know that the reality is more nuanced (and sadly requires more work).

Myths:

  • Deep learning (or Cognitive Analytics) is an easy button.  You can throw massive amounts of data and the algorithm will deliver a near optimal model.
  • Big data is always better than small data.  More rows of data always results in a significantly better model than less rows of data.

Both of these myths lead some (lately it seems many) people to conclude that data scientist will eventually become superfluous.  With enough data and advanced algorithms maybe we don’t need these expensive data scientists…

Read on for a dismantling of these myths.  There’s a lot more than “collect all of the data and throw it at an algorithm” (and even then, “all” the data rarely really means all, which I think deserves to be a third myth).  H/T R-bloggers

Comments closed

SQL Server Memory Usage Myths

Eric Blinn has a few myths regarding memory usage in SQL Server:

My VM administrator says that I’m not using all the memory I asked for.  In fact, 70% of it is idle at any given time.  We’re going to return that memory to the resource pool to better utilize it on other VMs.

The VM administrators are not lying or misinterpreting what is on their screen.  The metrics displayed on their management tool (Microsoft Hyper V Manager or VMWare vSphere Client) are lying to them.  When the VM management tool is checking on memory activity it asks the OS.  The OS only knows that this memory was allocated to SQL Server a long time ago and that it hasn’t moved since.  It appears to the OS to be stagnant and unused and this is what it reports to the VM management tool.  However, this memory, once allocated to SQL Server, is in the domain of SQLOS which is likely very actively using this memory in a way that is largely invisible to the OS, and by extension, the VM management tool and eventually the VM administrators themselves.  The VM tools are not yet smart enough to ask SQLOS what it is going on with the memory and report falsely that the memory is not being effectively utilized.

Worth reading, particularly if your sysadmins are trying to free up some of that “unused” memory.

Comments closed

Separating Data And Log Files

Brent Ozar looks at an old chestnut:

So it’s time for a quiz:

  1. If you put all of a SQL Server’s data files & logs on a single volume, how many failures will that server experience per year?
    • Bonus question: what kinds of data loss and downtime will each of those failure(s) have?
  2. If you split a SQL Server’s data files onto one volume, and log files onto another volume, how many failures will that server experience per year?
    • Bonus question: what kinds of data loss and downtime will each of those failures have?

Think carefully about the answers – or read the comments to see someone else’s homework, hahaha – before you move on.

With SANs, this advice is not even that good on the performance side—especially with modern SANs which don’t let you dedicate spindles.  It definitely doesn’t fly on the reliability side.

Comments closed