For our Procedure we wanted to show currently running jobs regardless of run time or run time vs historical run time, we also wanted to be able to see if the job was started by the Agent itself or a User, and to see which step the job is currently running on including that steps Elapsed time and last but not least the Total job elapsed time.
Now I will be the first to admit , this is not the prettiest code I have ever produced but getting some of this information out is quite tricky 🙂
This Procedure is a great alternative to the SQL Agent Activity Monitor, it doesn’t have all the information that the Monitor has but it probably has everything that you need to see at a glance – and whats more being that it is a Stored Procedure you could run this across multiple servers via registered servers for example and get results within seconds.
Click through for the script.
When the job fails, and the alert message compiled, this procedure gets called and the job name, step name, a delay value are passed to it. There is also a retry flag that comes back fro this procedure.
The first thing this procedure does is go and find the last failed step for the particular job. It then counts and based on the @retry value verifies if a retry job has already been created. This is in case some other process tries to do this same thing and should help prevent too many retries from firing off.
If a retry job does not exist, this process creates a new disposable job that will rerun the original from the beginning or the step that failed based on the checking for “Level 1” or “Level 2” in the job name. The job is prefixed with ‘Retry -‘ so it can be found easily in your server’s job list.
If a delay is specified, 2 minutes in this example, then it calculates a new run time for the retry job and finally creates the job.
This helps make SQL Agent jobs a little more robust.
So here is what it does:
Check for failed agent jobs within the dates you specify (provided the agent log has data for this period) @FromDate will default to the past 12 hours if no value is passed in, @ToDate will default to GetDate() if no value is passed in.
Check that any failed jobs that have occurred within the date range have not subsequently succeeded, if the job has since succeeded since its last failure then this job will not be included in the results.
Check that the job is not currently running, if the job is still running at the time of executing the procedure then this job will be excluded from the results.
If a failed agent job exists that passes through the above checks then the Last Failure message will be obtained for the job and shown in the results along with the Job name, Failed Step and Failed Datetime.
Read on for the script.
For Availability Groups we have a few extra error numbers we care about. Error number 1480 tells when a server changes roles, so we can know when a server flips from a secondary to a primary, or from a primary to a secondary. Error number 35264 tells when data movement has suspended on any database. This can occur for many reasons. One I have seen is when you have expanded your mount point on your primary and the data or log file runs out of space on the secondary the data or log file can not expand on the secondary because you forgot to expand the secondary. Error number 35265 tells you when the data movement has resumed on any database. Error number 41404 let’s you know if your AG is offline which can be bad if you expected an automatic failover. Error number 41405 let’s you know if an Availability Group can’t automatically failover for any reason. In the later to cases you will want to look at your SQL Error logs and AlwaysOn Extended Events Health session.
Click through for the alert scripts.
Sometime as a ETL developer or Database Administrator you will need to gain insight into SQL Agent job executions times. This insight can be used to proactively monitor the processing times of the various jobs running within your data environment.
Information about jobs execution times is stored in the MSDB database in table sysjobhistory. This table has the start time and the run duration times which I have used to create a report that will show the average job start and end times by month for all jobs running on a instance of SQL Server.
This is a very useful start. If I start counting on this data, I’d do two things: first, save it somewhere else permanently (because you want to clear out SQL Agent job history occasionally so the GUI doesn’t choke when you try to view job history); and second, look more at percentiles, particularly 95th and 99th percentiles for frequently-running jobs.
Upon completion of a job, the next run time is calculated based on the last scheduled time plus the schedule interval. However, allowance is made for the edge cases where the completed invocation overruns into the next start time. In such a situation, there isn’t a “catch-up” run; instead, the schedule is advanced iteratively until it reaches a future point in time.
However, 2016 introduces a new twist. When applying the “is the proposed next schedule time after Now()?” check, it adds five seconds to Now(). In other words, the question becomes “Is the proposed next schedule time more than five seconds in the future?”
Ewald jumps into the debugger to understand this better, so click through for that.
Q: Is there any way to handle the execution of SSIS packages stored locally?
A: Azure Automation works on Azure resources. It cannot be used for executing local SSIS packages.
In some cases, you may still need a scheduling tool (which might be a VM with SQL Agent).
From this point we will work directly with SQL Server.In order to establish connection you have to run following script.The most important are 2nd and third lines:– In second line you have to provide your SQL Server Instance address, by replacing “<your_server_instance>” by something like “192.168.58.11” or “192.168.58.11\MSSQLSERVER,1433”– When second line runs it will ask you for SQL Server credentials !!! So, you have to enter SQL user name and it’s password.
Slava does note some limitations at present, but a lot of the functionality seems to be there.
While it is easy to build and test policies by executing them on demand (especially powerful when run through Central Management Server) I had some issues getting my policies to run in “on schedule” mode.
To be more specific, my policies that use the ExecuteSQL function have been an issue. What I was finding was:
- The policy would run fine “on demand” but…
- When I run the policy through the PBM scheduler, the policy would fail.
Dealing with false positives is not a good start for any monitoring service, so getting to the root of the issue was critical.
Read on for the solution.
Now I’m angry too since I count these failures as personal and I don’t like failing, so I get cracking on the investigation.
Straight away, that error message doesn’t help my mood.
I’m not indexing a view!
I’m not including computed columns!
It’s not a filtered index!
The columns are not xml data types, or spatial operations!
And nowhere, nowhere am I using double quotes to justify needing to set QUOTED_IDENTIFIER on!
SO WTF SQL SERVER, WHY ARE YOU GIVING ME THESE ERRORS???
Read the whole thing.