This solution runs sp_WhoIsActive every minute and saves the output into a global temp table. From there, I look for any processes that have been running for more than the low threshold setting. Any of the processes that have not been identified and stored previously get logged, output to an HTML table, and an email alert sent.
Next, I take a second look at the table for anything that’s been running longer than the high threshold. If a second email alert has not been sent for these processes, we output the same data and send the email. If two alerts have already been sent for these processes, I don’t do anything else at the moment. One of the next updates to this script will send an alert to our DevOps notification system for anything running longer than some final threshold (or maybe just the high threshold).
I particularly like this part about not re-alerting over and over for a long-running query. It’s a relatively minor part of the whole solution, but it gets annoying watching the same e-mail come in every 5 minutes, especially if there’s nothing you can (or at least want to) do about the cause.