Press "Enter" to skip to content

Filtering Alert Noise With A Leaky Bucket Algorithm

I have a post implementing a leaky bucket algorithm in T-SQL:

Now that we have a table, we want to do something with it.  The most naive solution would be to fire off an alert every time a row gets added to this table.  The problem with this solution is that it can flood our inbox with useless information.  Suppose the developers push out an API change that breaks everything.  The first 500 response will be important information.  The second one might be important because it’s confirmatory.  The six hundredth alert will not help.  And heaven help you if you’ve got this alert tied to your PagerDuty account…

So let’s only alert if we get to a certain threshold—we’ll call it 5 messages.  Fewer than 5 and we can assume it’s a blip or just somebody doing something weird.  The next part of the problem is, 5 messages per what?  We don’t want to alert for every fifth error message.  Let’s say we get one error message a year, so there was one in 2014, one in 2015, one in 2016, one in 2017, and now one in 2018.  If we simply set our threshold to report on every fifth error, we’ll get an alert in 2018, but most likely, the prior years’ errors are long gone by now, so that’s not helpful.

Read on for the solution.  I’ve been quite happy with the solution in practice, as it has cut down the number of spurious alert e-mails to practically nil.