We are checking for failure logs, currently by “count>25 within last 30m”
But if 25 failures occured in the first 5min, and then the matter is resolved by other means we don’t need to be alerted 25min later anymore.
So i want to tweak it so that, if say, 80% of all logs within last 30m are failure logs, then alert.
edit: actually, by percentage suffers the same problems. It should work on a rate/per minute or so (see also Alert based on the Rate of certain status)