I needed to update some of our long running job monitoring code to improve it from the version that we have right now. I like this version because it uses msdb.dbo.syssessions (https://msdn.microsoft.com/en-us/library/ms175016.aspx) to validate that a job is actually running. I also wanted to know the percent difference between the current run duration versus an average duration per job from the past 30 days. I decided to place the calculated average into a table variable and then join on it to get my results. I also used the IIF function (https://msdn.microsoft.com/en-us/library/hh213574.aspx) to help me avoid a divide by zero error that comes up when the average duration equals 0.
One thing which could cut down on false positives would be to calculate the standard deviation as well. I wouldn’t automatically assume that job executions were normally distributed, but if you look at things more than one standard deviation away from the mean, it should remove noise of jobs which are just a little over the average but not in dangerous territory.
These alerts cover a range of errors from potential IO subsystem problems to failed logins, all of which are things a DBA needs to know about, and quickly too.As well as error notifications you can set up alerts to cover performance conditions. The final statement in the script below sets up an alert that triggers when Page Life Expectancy drops below 1000. In all honesty I don’t set up these performance alerts that often but I wanted to show you the kind of thing that is possible and would be handy if you don’t have any third party monitoring.
But what do I mean by sensible? Typically I see a number of problems with alerting setups; either alerts are inadequate and don’t cover the necessary errors (or there are none at all) but I also see the notifications to alerts not being set up correctly meaning problems go backwards and forwards delaying any fixes.The other problem I see is an over provision of alerts. This usually is because one or more other monitoring systems have been deployed and error notifications have been duplicated as a result. Imagine having an operational tool like System Centre, some SQL monitoring software and native alerting all pinging the same message to the one recipient mailbox. Now on top of that let’s say the alerts have not been configured correctly so information emails are being issued every second. It’s a scary thought but it is easy to see how a critical error might be missed in this scenario.
If you don’t have automatic alerts for high-severity errors, this is an easy way of gaining insight into the problems your server is experiencing.
If all you have is a hammer, everything will eventually start looking like a nail. This is generally known as Maslow’s hammer and refers to the fact that you use the tools you know to solve any problem, regardless if that’s what the problem actually needs. With that said, I frequently need a way to visualize the load distribution of scheduled jobs over a day or week, but I could never be bothered to set up a web server, learn a procedural programming language or build custom visualizations in PowerBI.
So here’s how to do that without leaving Management Studio.
Click through for discussion and link to the code.
The first two parts of this series addressed the general approach that I use in an SSIS script task to discover and alert on missed SQL Agent jobs. With apologies for the delay in producing this final post in the series, here I bring these approaches together and present the complete package.
To create the SSIS, start with an empty SSIS package and add a data flow task. In the task, add the following transformations.
Regardless of how you do it, knowing when jobs fail is important enough to build some infrastructure around answering this question.
The first job took 15 hours 41 minutes 53 seconds, the second 1 minute 25 seconds, the third 21 seconds. This makes it quite tricky to calculate the duration in a suitable datatype. In T-SQL people use scripts like the following from MSSQLTips.com
((run_duration/10000*3600 + (run_duration/100)%100*60 + run_duration%100 + 31 ) / 60) as ‘RunDurationMinutes’
I wish that some version of SQL Server would fix this “clever” duration. We’ve had the time datatype since 2008; at least add a new column with run duration as a time value if you’re that concerned with backwards compatibility.
I’ve always found fork bombs funny because of their elegant simplicity, so I figured, why not build one in SQL Server?
In order to do it, I needed a way to spawn a self-replicating asynchronous process, so I built:
A stored procedure
That creates an Agent job
That runs the stored procedure
I didn’t think it was possible. I certainly didn’t think it would take a half-dozen lines of code.
Fortunately, I had set up a SQL Agent Alert for errors with Severity Level 17, which emailed me and several coworkers to alert us to the problem. But this was unfortunate too. Every one of those alert occurrences sent an email. Sure, it’s nice to know when there’s a problem, but a thousand or more emails is most certainly overkill. After addressing the transaction log issue, the alert emails kept coming. This query told me there were still a few thousand unsent emails:
Getting a message that something is down is important. Getting several thousand messages is counterproductive.
The other day I was asked How to notify multiple operators using database mail for failed jobs and a different operators for successes.
First I looked at the operator email addresses field firstname.lastname@example.org;email@example.com….etc is not helping as there is a limit on the characters in the email name entry of operator and we still have the problem we need different groups for success and failures.
It makes perfect sense, but is non-trivial. I like it.
There are three tokens within the T-SQL (highlighted in yellow above): A-MSG, DATE, and TIME. SQL server replaces these three tokens as follows:
- A-MSG: Message text. If the job is run by an alert, the message text value automatically replaces this token in the job step.
- DATE: Current date (in YYYYMMDD format).
- TIME: Current time (in HHMMSS format).
See the MSDN documentation for a list of tokens and their descriptions.
This is a great way of being smarter with alerts. Your SQL Server instance has a lot of information at the ready, so get familiar with what’s up for offer.
There are purposes for credentials other than a proxy, but for our purposes you are just going to enter an AD username and password. Just to be even more clear, this is an AD/Windows user. Not a sql server login.
In Object Explorer: ServerName -> Security -> Right click on Credentials and select New Credential -> Fill in the Name, Identity and Password fields.
Kenneth’s getting fancy with animated GIFs, and gives us a good walkthrough of this aspect of SQL Agent security.