Press "Enter" to skip to content

Category: SQL Agent

Messing With The SQL Agent Job System

Adrian Buckman shows what happens if you start fiddling with SQL Agent tables:

Some time ago I came across a strange issue where I found a number of duplicated SQL Agent jobs, the odd thing is SQL will not allow you to have more than one agent job with the same name – they need to be unique.

This got me scratching my head a little at first, so I started out with some basic checks of the msdb tables.

This is example #5008 of just how poor the SQL Agent database design is.  Example #1 is the absurd date-time notation.

Comments closed

Monitoring SQL Agent Job Failures

Mark Wilkinson shows how to set up a SQL Agent job failure monitoring solution:

Since we are storing the date the records are added to the table, this query will always return the latest set of failures. This is a simple example, but the possibilities are endless:

  • Send the results of this query via database mail

  • Join with dbo.sysjobs and dbo.syscategories, alerting on different thresholds per job category

  • Extend the TOP (1) to include multiple capture periods and alert on average failures per capture

Check it out.  This is particularly helpful if you get blasted with thousands of error messages per minute because somebody made a bunch of untested changes and broke every job in your environment and caused the mail server to throttle your account for a multi-day period.  Not that this has ever happened to me, of course…

Comments closed

Synchronizing Logins And Jobs

Ryan Adams shows five methods for synchronizing SQL logins and a couple ways of synchronizing SQL Agent jobs between instances of SQL Server:

Robert Davis wrote a great script back when he published his Mirroring book.  I started to write my own and was almost done when I contacted Robert and asked if he had dealt with SQL logins since the script only handled Windows logins.  His reply was something along the lines of, “What are you talking about? Of course it handles SQL logins”.  It turns out that the publisher didn’t get the right script version published with the book.  That’s when this post from Robert with the full script was born…

Transferring Logins

I also wrote about it HERE.

This script creates a stored procedure to handle the move and also uses Linked Servers.  If you can’t have linked servers in your environment this is not a good choice for you.  However, you can create the linked server in a SQL Agent job step prior to the step for transfer and then remove it in a job step after the transfer.  It breaks the rule but it does it fast enough maybe no one will notice.

Read the whole thing.

Comments closed

Finding Running Agent Jobs

Adrian Buckman has a script to find all running SQL Agent jobs:

For our Procedure we wanted to show currently running jobs regardless of run time or run time vs historical run time, we also wanted to be able to see if the job was started by the Agent itself or a User, and to see which step the job is currently running on including that steps Elapsed time and last but not least the Total job elapsed time.

Now I will be the first to admit , this is not the prettiest code I have ever produced but getting some of this information out is quite tricky 🙂

This Procedure is a great alternative to the SQL Agent Activity Monitor, it doesn’t have all the information that the Monitor has but it probably has everything that you need to see at a glance – and whats more being that it is a Stored Procedure you could run this across multiple servers via registered servers for example and get results within seconds.

Click through for the script.

Comments closed

Conditional Job Retry

Chris Bell has a procedure which conditionally retries a failed SQL Agent job from a pre-determined step:

When the job fails, and the alert message compiled, this procedure gets called and the job name, step name, a delay value are passed to it. There is also a retry flag that comes back fro this procedure.

The first thing this procedure does is go and find the last failed step for the particular job. It then counts and based on the @retry value verifies if a retry job has already been created. This is in case some other process tries to do this same thing and should help prevent too many retries from firing off.
If a retry job does not exist, this process creates a new disposable job that will rerun the original from the beginning or the step that failed based on the checking for “Level 1” or “Level 2” in the job name. The job is prefixed with ‘Retry -‘ so it can be found easily in your server’s job list.
If a delay is specified, 2 minutes in this example, then it calculates a new run time for the retry job and finally creates the job.

This helps make SQL Agent jobs a little more robust.

Comments closed

Finding Failed Agent Jobs

Adrian Buckman has a stored procedure which retrieves failed SQL Agent jobs over a given timeframe:

So here is what it does:

  1. Check for failed agent jobs within the dates you specify (provided the agent log has data for this period) @FromDate will default to the past 12 hours if  no value is passed in, @ToDate will default to GetDate() if no value is passed in.

  2. Check that any failed jobs that have occurred within the date range have not subsequently succeeded, if the job has since succeeded since its last failure then this job will not be included in the results.

  3. Check that the job is not currently running, if the job is still running at the time of executing the procedure then this job will be excluded from the results.

  4. If a failed agent job exists that passes through the above checks then the Last Failure message will be obtained for the job and shown in the results along with the Job name, Failed Step and Failed Datetime.

Read on for the script.

Comments closed

Availability Group Agent Alerts

Tracy Boggiano builds a set of SQL Server alerts related to Availability Group happenings and issues:

For Availability Groups we have a few extra error numbers we care about. Error number 1480 tells when a server changes roles, so we can know when a server flips from a secondary to a primary, or from a primary to a secondary. Error number 35264 tells when data movement has suspended on any database. This can occur for many reasons. One I have seen is when you have expanded your mount point on your primary and the data or log file runs out of space on the secondary the data or log file can not expand on the secondary because you forgot to expand the secondary. Error number 35265 tells you when the data movement has resumed on any database.  Error number 41404 let’s you know if your AG is offline which can be bad if you expected an automatic failover.  Error number 41405 let’s you know if an Availability Group can’t automatically failover for any reason.  In the later to cases you will want to look at your SQL Error logs and AlwaysOn Extended Events Health session.

Click through for the alert scripts.

Comments closed

Monthly Job Run Time Averages

Tywan Terrell has a script to see how his monthly SQL Agent jobs are performing in terms of average run time:

Sometime as a ETL developer or Database Administrator you will need to gain insight into SQL Agent job executions times. This  insight can be used to proactively monitor the processing times of the various jobs running within your data environment.

Information about jobs execution times is stored in the MSDB database in table sysjobhistory. This table has the start time and the run duration times which I have used to create a report that will show the average job start and end times by month for all jobs running on a instance of SQL Server.

This is a very useful start.  If I start counting on this data, I’d do two things:  first, save it somewhere else permanently (because you want to clear out SQL Agent job history occasionally so the GUI doesn’t choke when you try to view job history); and second, look more at percentiles, particularly 95th and 99th percentiles for frequently-running jobs.

Comments closed

SQL Agent’s 5 Second Rule

Ewald Cress uncovers a change in the way the SQL Agent scheduler works in SQL Server 2016 compared to prior versions:

Upon completion of a job, the next run time is calculated based on the last scheduled time plus the schedule interval. However, allowance is made for the edge cases where the completed invocation overruns into the next start time. In such a situation, there isn’t a “catch-up” run; instead, the schedule is advanced iteratively until it reaches a future point in time.

However, 2016 introduces a new twist. When applying the “is the proposed next schedule time after Now()?” check, it adds five seconds to Now(). In other words, the question becomes “Is the proposed next schedule time more than five seconds in the future?”

Ewald jumps into the debugger to understand this better, so click through for that.

Comments closed