I needed to update some of our long running job monitoring code to improve it from the version that we have right now. I like this version because it uses msdb.dbo.syssessions (https://msdn.microsoft.com/en-us/library/ms175016.aspx) to validate that a job is actually running. I also wanted to know the percent difference between the current run duration versus an average duration per job from the past 30 days. I decided to place the calculated average into a table variable and then join on it to get my results. I also used the IIF function (https://msdn.microsoft.com/en-us/library/hh213574.aspx) to help me avoid a divide by zero error that comes up when the average duration equals 0.
One thing which could cut down on false positives would be to calculate the standard deviation as well. I wouldn’t automatically assume that job executions were normally distributed, but if you look at things more than one standard deviation away from the mean, it should remove noise of jobs which are just a little over the average but not in dangerous territory.