While I can’t be 100% sure of the trigger, I’m 99.9% sure, because the job was running before the outage, so the timing is right. After looking through our monitoring logs, everything pointed to the job being the cause, so yes, I’m confident it caused it.
We don’t have regular maintenance windows for any of our servers, so we run jobs throughout the week, and if possible, try to schedule them during low-usage times. In this case, the job was an index maintenance job.
Now, before you scream at me about running an index maintenance job, I’m not going to argue the pros and cons of using it or whether or not we should run it — we can do that at another time. For this post, just accept the fact that we were running a job to rebuild/reorganize indexes
This is an interesting after-analysis of an outage. I have a lot of respect for people who can put these together and make them public—I would have a lot of trouble doing that myself.