An Example Of Error Message Ambiguity

Jason Brimhall has an interesting story around a common error message:

Now is the time to start pulling out your hair. There is no syntax error in the query. Go ahead and look it over 10-15 times. I know I did.

Ok, if there is no syntax error, then what could possibly be the real problem? Is the database corrupt? Maybe a system table is corrupt? Grasping at straws here, but could it possibly even be some sort of royally screwed up permissions.

Everything seems to be checking out properly. There is no corruption whatsoever. Laptop is soon to be launched at this point right? Ok, maybe not launched because this is a simple query. But, had this been a production related query that was rather intense and complicated, there really may be something getting launched as the frustration mounts.

Click through for the answer.  Sometimes the error message is technically correct but utterly confounding.

The Magic Of Sampling

Nathan LeClaire reminds us of an important story that statisticians have been telling us for a couple centuries:

It starts slowly. Maybe your home-grown centralized logging cluster becomes more difficult to operate, demanding unholy amounts of engineer time every week. Maybe engineers start to find that making a query about production is a “go get a coffee and come back later” activity. Or maybe monitoring vendors offer you a quote that elicits a response ranging anywhere from curses under the breath to blood-curdling screams of terror.

The multi-headed beast we know as Scale has reared its ugly visage.

As some of you may have already guessed from the title, I’m going to discuss one way to solve this problem, and why it might not be as bad as you might think.

Take some of your precious information and throw it in the garbage. In lots of cases, you can just drop those writes on the floor as long as your observability stack is equipped to handle it.

In other words, sample.

Read on for a couple of methods.  One thing I’ve taken a fancy to is collecting the first N of a particular type of message and keeping track of how often that message appears.  If you get the same error for every row in a file, then you might really only need to see that one time and the number of times it happened.  Or maybe you want to see a few of them to ensure that they’re really the same error and not two separate errors which are getting reported together due to insufficient error separation.

When “Invalid Column Name” Isn’t A Permissions Issue

Kenneth Fisher shares a story of hunting down the cause of an error message:

This time we had a vendor reporting the following error:

Msg 207, Level 16, State 1, Line 7
Invalid column name ‘Name’

Now the vendor was certain this was a permissions issue. It worked fine on their systems, it worked fine on some of ours. So why didn’t it always work? Well, the easy answer is permissions! Particularly since we had denied them db_owner just recently.

So why do I sound so dismissive about permissions as a possibility? I mean it COULD be permissions. It certainly is possible. But first of all, we don’t use column level permissions very often (no one uses them all that often from what I can tell) and secondly it worked on several other systems where they had exactly the same permissions as this system.

Ok, so what is the problem? You guessed it! (I really have to stop asking for guesses after I’ve put the answer in the title.)

I went out of my way not to give the answer here, so you’ll have to look at Kenneth’s title.  And then read the whole thing.

Unresolved Reference In Database Project

Ed Elliott explains something fairly straightforward and gives us a primer on using Stack Overflow too:

If you build an SSDT project you can get an error which says:

“SQL71502: Function: [XXX].[XXX] has an unresolved reference to object [XXX].[XXX].”

If the code that is failing is trying to use something in the “sys” schema or the “INFORMATION_SCHEMA” schema then you need to add a database reference to the master dacpac:

Click through for the answer and a comment-by-comment walkthrough from Ed.

Linked Servers And Inaccessible Statistics

Jason Brimhall troubleshoots an error message involving linked servers and statistics:

On more than one occasion I have had an emergency request because everything was broken. The everything in almost every incident is an SSIS package that is failing with error messages. The error message will typically have text similar to the following:

Could not locate statistics ‘_WA_Sys_00000015_346C780E’ in the system catalogs.

Due to the error, the package fails processing and grinds to a halt. When diving into the package it is discovered that the missing stats happen to be coming from a linked server query. This raises a big bright blaring alarm for me. Why is the SSIS package accessing the data via a linked server? This is rather counter-productive and definitely contrary to what is desired from a performance perspective.

Jason methodically walks us through the troubleshooting process and provides the solution at the end.

Errors Using Native Prediction In SQL Server

Sacha Tomey walks us through a few potential issues when converting code which uses SQL Server Machine Learning Services’s sp_execute_external_script procedure to native PREDICT calls:

Stumble One:

Error occurred during execution of the builtin function 'PREDICT' with HRESULT 0x80004001.
Model type is unsupported.

Reason:

Not all models are supported. At the time of writing, only the following models are supported:

  • rxLinMod
  • rxLogit
  • rxBTrees
  • rxDtree
  • rxdForest

sp_rxPredict supports additional models including those available in the MicrosoftML package for R (I was using attempting to use rxFastTrees). I presume this limitation will reduce over time. The list of supported models is referenced in the PREDICT function (Documentation).

sp_rxPredict does require CLR, but it’s a viable alternative if you need to use a model not currently supported—like rxNeuralNet.

Automatic Retry With Optimistic Concurrency

Vladimir Khorikov explains an anti-pattern when dealing with a model using optimistic concurrency (for example, memory-optimized tables):

Alright, back to the original question. So, how to combine optimistic locking and automatic retry? In other words, when the application gets an error from the database saying that the versions of a Product don’t match, how to retry the same operation again?

The short answer is: nohow. You don’t want to do that because it defeats the very purpose of having an optimistic lock in the first place.

Remember that the locking mechanism is a way to ensure that all changes are taken into consideration when changing a record in the database. In other words, someone should review the new version of the record and make an informed decision as to whether they still want to submit the update. And that should be the same client who originated the initial request, you can’t make that decision for them.

Plenty of systems do this sort of data merging automatically, but I get Vladimir’s point:  if someone else pulled the rug out from under you, it might change your decision on what that data should look like.

Invalid Class Error Trying To Access WMI Class

Claudio Silva troubleshoots an error which gives the user a red herring:

This can return more than one line with different ComputerManagement (like ComputerManagement10). It depends on the versions you have installed on the host. The number “10” refers to the SQL Server 2008.
Now I can uncomment the last command and run it. The result is:

Get-CimInstance : Invalid class
At line:1 char:1
+ Get-CimInstance -CimSession $CIMsession -Namespace $(“rootMicrosoftSQLServerC …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : MetadataError: (:) [Get-CimInstance], CimException
+ FullyQualifiedErrorId : HRESULT 0x80041010,Microsoft.Management.Infrastructure.CimCmdlets.GetCimInstanceCommand
+ PSComputerName : HOST001

Ok, a different error message. Let’s dig in it. I logged in on the host and confirmed that I have a SQL Server 2008 R2 instance installed. This means that I’m not accessing a lower version than 2005 like the initial warning message was suggesting.

Read the whole thing.

Errors With Invalid Backup Location

Adrian Buckman shows the types of errors you should expect when your default backup location is invalid:

Recently I was looking through the error log on one of my test machines and I spotted some unusual errors:

SQL ERROR: 3634 – The operating system returned the error ‘3(The system cannot find the path specified.)’ while attempting ‘DeleteFile’
SQL ERROR: 18272 – During restore restart, an I/O error occurred on checkpoint file (operating system error (null)). The statement is proceeding but cannot be restarted. Ensure that a valid storage location exists for the checkpoint file.

At first I assumed that I may have tried restoring a database to a location that did not exist but this was not the case, the actual issue was with SQL Server’s Default Backup Location.

Read on for the full explanation.

Failure Modes In Event-Based Systems

Dave Copeland has an interesting article on understanding how message- and event-based systems can fail:

The system trigger (1) initiates everything. Common failures here are timeouts inside Application. This is particularly insidious because when this happens, the System Trigger may retry the operation. Think about a user on a webpage getting a 500 error. They will likely retry what they were doing until it succeeds.

This means that the entire workflow could be triggered multiple times, and it could be done in a way that is not programmatically obvious. Imagine our Merchandise buyer marking down an item’s price, and the entire operation succeeds but at the last minute their Internet connection dies and they get an error. They will repeat the markdown action and now there will be two messages about the inventory price being sent.

This is an interesting read.  Also, definitely check out Dave’s earlier post on how there is no happy path; it seems that most developers only code for a chimera, as there is so much code that assumes everything will work perfectly.

Categories

November 2017
MTWTFSS
« Oct  
 12345
6789101112
13141516171819
20212223242526
27282930