I’m having problems to understand how alert system works with their state resolved / unresolved
I’m running Graylog 2.4.6 configured to receive syslog alerts from differents servers
I’ve created 2 alerts to match ssh authentications
The field on “message” and the value to be either “Accepted” or “failure” (easy match to trigger)
Grace period 0, backlog 1 line, and repeat notifications.
For some hosts, I receive a mail, for the others, nothing.
A quick search confirms me that Graylog received the log containing my ssh login, so this part is working.
I’ve seen on this topic that you need to repeat notifications AND a non matching alert to make it work, what is a non alert ?
You search for the field content 1 in the field number - the alert will be active as long as 1 is found in the field in the time you specific for the alert. If 1 is not found the alert will be resolved.
Oh, yeah, that’s what i understood. but as soon as i received no mails for some hosts, i thought it came from here.
So yeah, i’ve configured alert to match text “Accepted” on field message. When it matches, alert goes to Unresolved, then next syslog line received (not matching, of course), alert goes to Resolved.
I think you’re missing one little detail that you need to know to understand Graylog alerts.
Alerts trigger when at least one match was found. It does not trigger for each match individually.
This means: The alert conditions are checked every 60 seconds. If you have a search interval of 1 minute and there has been 4 hosts that match your search criteria in this timespan, Graylog will generate ONLY ONE alert for all 4 matches.
If you want an alert for each host, you’ll either have to define a stream, alert and notification for each host or you’ll have to increase the backlog size to accomodate all hosts and parse the hosts from that lists.
I think this is the little details that is missing here
Oh yeah indeed, i was missing this information !
Thank you Jan and Phillip for your (very quick) support.
I don’t think that increasing the backlog size is a good idea for my situation
As creating as many streams as i have servers looks like a long job, any other suggestion ?
Won’t it make the solution slower ?
I’ve browsed the marketplace for plugins about this, nothing interesting found (or i missed it)
Again, thanks, I’ve spent the last 2 days digging in docs and forums, but have really missed that “last one match” thing
Well, you could do that, but that’ll probably tax the machine so hard to make it crash.
Every alert condidtion would be checked every second, while they will likely take more than a second to finish.
Doing that is definetly not recommended.
@Krash in theory that would work - but only if the search returns before the next search runs …
Currently the alerting is very limited, near future versions will improve that part of Graylog (but not 2.5 or 3.0) currently my advice would be to use your monitoring tool to run a check on a stream and alert with that tool on every message.
@derPhlipsi not only this might crash elasticsearch - in the end the result could be never triggerd notifications.
@Krash What I’ve started doing is using some Pipelines to figure out what box a log came from, and then sort them into named Streams which then scan for the error and send the alert.
So, for instance, if a message comes in from e.g qa5.example.com, I have a pipeline which finds the number of the QA box and saves that into a field, then the next step looks at that field and sorts it to the QA5 Stream. In the QA5 Stream I have an alert which just looks for Level 3 errors and alerts on them.
This means if QA5, QA6, and QA7 all send errors at the same time, they all get individually alerted on.
I also have a Step 0 in this pipeline which is just a big list of messages I don’t want it to alert me about.
@Grakkal, that’s a good idea indeed, but i’ve started to work on @derPhlipsi solution
Thanks to APIs, i’m now able to create everything (own stream/alert/notification) for each host I need to have a check (each stream matching only 1 IP address)
The fact is… alert still does not work for some streams
I’ve tested on 28 hosts for now (ESXi), not working for 4 of them.
As I use the same API to create all my streams, that cannot be a coding error
As I see matching logs for these 4 hosts in their stream, i guess it’s not related to a misconfiguration on timezones
Any way to have a higher debug level in order to understand how this happends ?