Question regarding alerts and monitoring

Hi there!

I have a question regarding alerts. I’ll explain what I’m trying to do so you can tell me if it’s even possible.

  • I specified a string in the alert tab. Let’s say it’s “Got error.”
  • I want Graylog to alert me whenever this string occurs.

So far, so good.

  • I wrote a Python script which checks for recent alerts to use with our monitoring.
  • My problem now is that even when the problem got resolved, Graylog still finds in the last 24h, alas still giving me an alert.

My question now is: Is there any way to achieve what I want? If not, any alternatives?

Graylog can’t resolve events. If your search window contain messages with that string it will send out a notification unless the sliding window does not contain this string anymore.

You might want to search every 5 minutes for the last five minutes and you will more granular results.

Does that makes sense for you?

Thank you for your answer. I understand that Graylog can’t resolve events and I already thought about about reducing the time window. But:

If the host producing the log only sends 1 log with the error message and 5 minutes later it still has a problem, the query would not show the alert message and alas there would not be another alert.

Any ideas on that?

Do not use logs for stateful alerting … you really want some kind of monitoring system for that, cause how should a log tool know if something is “good” or “bad” it just search in your logs and return data.

You are totally right. We do have stateful alerting in form of Icinga2. The problem is, though, that we want the alert regarding this one log entry to be acknowledgable + resolvable.

I guess we have to come with something up ourselves, then. Thanks nontheless!

if you already have icinga2 - why not create a stream for this single (possible) alert - and search in this stream with icinga. If you get a result for the search, use icinga to alert…

… or you use the event search api to search for events you want to alert on.

I’ve already written something like that in Python.

Problem is still: I can not resolve this. So if I search in the last 24 hours, even if I actually resolve the issue it will still come up as alert because one can find the log entry triggering the alert.

you hold it wrong … lets say you search every 5 minutes you should only search the last 5 minutes to avoid duplicated finds.

This way you will not get the same entry multiple time …

But what about the entry indicating the problem only comes up once. After 5 minutes the alert will be ‘resolved’ although in realitiy it is not.

but the problem will not be resolved if you leave that out to be manual … in icinga such should be possible.

Could you elaborate on that? I’m afraid I do not understand what you’re trying to say.