How often have you found yourself contemplating some hair-brained regex scheme in order to extract an inkling of value from a string and wishing the data had just arrived in a well-structured package without all the textual fluff?
So why do we insist on writing prose in our logs? Take “Exception while processing order 1234 for customer abc123” for example. There are at least four important pieces of information drowning in that one sentence alone:
- An exception was raised!
- During order processing
- Order number 1234
- Customer abc123
Being an exception log message, it’s more than likely followed by a stack trace, too. And stack traces certainly don’t conform to carefully crafted log layout patterns.
Logging is something we tend to forget about and slap in at the last minute. We also think about it from the viewpoint of a developer looking at a single error message. Those are both mistakes that lead to a huge amount of extra work later.
Try[T] is another construct to capture the success or a failure scenarios. It returns a value in both cases. Put any expression in Try and it will return Success[T] if the expression is successfully evaluated and will return Failure[T] in the other case meaning you are allowed to return the exception as a value. However with one restriction that it in case of failures it will only return Throwable types:def validateZipCode(zipCode:String): Try[Int] = Try(zipCode.toInt)
But Throwing an exception doesn’t make much sense here since it is not much of a calculation. Although we can take this example to understand the use case. If the given string is not a number, it will be a failure. The value from the Try can be extracted in same as Option. It can be matched
As you write more complicated Spark operations, handling errors becomes critical.
Now is the time to start pulling out your hair. There is no syntax error in the query. Go ahead and look it over 10-15 times. I know I did.
Ok, if there is no syntax error, then what could possibly be the real problem? Is the database corrupt? Maybe a system table is corrupt? Grasping at straws here, but could it possibly even be some sort of royally screwed up permissions.
Everything seems to be checking out properly. There is no corruption whatsoever. Laptop is soon to be launched at this point right? Ok, maybe not launched because this is a simple query. But, had this been a production related query that was rather intense and complicated, there really may be something getting launched as the frustration mounts.
Click through for the answer. Sometimes the error message is technically correct but utterly confounding.
It starts slowly. Maybe your home-grown centralized logging cluster becomes more difficult to operate, demanding unholy amounts of engineer time every week. Maybe engineers start to find that making a query about production is a “go get a coffee and come back later” activity. Or maybe monitoring vendors offer you a quote that elicits a response ranging anywhere from curses under the breath to blood-curdling screams of terror.
The multi-headed beast we know as Scale has reared its ugly visage.
As some of you may have already guessed from the title, I’m going to discuss one way to solve this problem, and why it might not be as bad as you might think.
Take some of your precious information and throw it in the garbage. In lots of cases, you can just drop those writes on the floor as long as your observability stack is equipped to handle it.
In other words, sample.
Read on for a couple of methods. One thing I’ve taken a fancy to is collecting the first N of a particular type of message and keeping track of how often that message appears. If you get the same error for every row in a file, then you might really only need to see that one time and the number of times it happened. Or maybe you want to see a few of them to ensure that they’re really the same error and not two separate errors which are getting reported together due to insufficient error separation.
This time we had a vendor reporting the following error:
Msg 207, Level 16, State 1, Line 7
Invalid column name ‘Name’
Now the vendor was certain this was a permissions issue. It worked fine on their systems, it worked fine on some of ours. So why didn’t it always work? Well, the easy answer is permissions! Particularly since we had denied them db_owner just recently.
So why do I sound so dismissive about permissions as a possibility? I mean it COULD be permissions. It certainly is possible. But first of all, we don’t use column level permissions very often (no one uses them all that often from what I can tell) and secondly it worked on several other systems where they had exactly the same permissions as this system.
Ok, so what is the problem? You guessed it! (I really have to stop asking for guesses after I’ve put the answer in the title.)
I went out of my way not to give the answer here, so you’ll have to look at Kenneth’s title. And then read the whole thing.
If you build an SSDT project you can get an error which says:
“SQL71502: Function: [XXX].[XXX] has an unresolved reference to object [XXX].[XXX].”
If the code that is failing is trying to use something in the “sys” schema or the “INFORMATION_SCHEMA” schema then you need to add a database reference to the master dacpac:
Click through for the answer and a comment-by-comment walkthrough from Ed.
On more than one occasion I have had an emergency request because everything was broken. The everything in almost every incident is an SSIS package that is failing with error messages. The error message will typically have text similar to the following:
Could not locate statistics ‘_WA_Sys_00000015_346C780E’ in the system catalogs.
Due to the error, the package fails processing and grinds to a halt. When diving into the package it is discovered that the missing stats happen to be coming from a linked server query. This raises a big bright blaring alarm for me. Why is the SSIS package accessing the data via a linked server? This is rather counter-productive and definitely contrary to what is desired from a performance perspective.
Jason methodically walks us through the troubleshooting process and provides the solution at the end.
Stumble One:Error occurred during execution of the builtin function 'PREDICT' with HRESULT 0x80004001. Model type is unsupported.
Not all models are supported. At the time of writing, only the following models are supported:
sp_rxPredict supports additional models including those available in the MicrosoftML package for R (I was using attempting to use rxFastTrees). I presume this limitation will reduce over time. The list of supported models is referenced in the PREDICT function (Documentation).
sp_rxPredict does require CLR, but it’s a viable alternative if you need to use a model not currently supported—like rxNeuralNet.
Alright, back to the original question. So, how to combine optimistic locking and automatic retry? In other words, when the application gets an error from the database saying that the versions of a Product don’t match, how to retry the same operation again?
The short answer is: nohow. You don’t want to do that because it defeats the very purpose of having an optimistic lock in the first place.
Remember that the locking mechanism is a way to ensure that all changes are taken into consideration when changing a record in the database. In other words, someone should review the new version of the record and make an informed decision as to whether they still want to submit the update. And that should be the same client who originated the initial request, you can’t make that decision for them.
Plenty of systems do this sort of data merging automatically, but I get Vladimir’s point: if someone else pulled the rug out from under you, it might change your decision on what that data should look like.
This can return more than one line with different ComputerManagement (like ComputerManagement10). It depends on the versions you have installed on the host. The number “10” refers to the SQL Server 2008.
Now I can uncomment the last command and run it. The result is:
Get-CimInstance : Invalid class
At line:1 char:1
+ Get-CimInstance -CimSession $CIMsession -Namespace $(“rootMicrosoftSQLServerC …
+ CategoryInfo : MetadataError: (:) [Get-CimInstance], CimException
+ FullyQualifiedErrorId : HRESULT 0x80041010,Microsoft.Management.Infrastructure.CimCmdlets.GetCimInstanceCommand
+ PSComputerName : HOST001
Ok, a different error message. Let’s dig in it. I logged in on the host and confirmed that I have a SQL Server 2008 R2 instance installed. This means that I’m not accessing a lower version than 2005 like the initial warning message was suggesting.
Read the whole thing.