Understanding alert system

Krash · September 13, 2018, 7:51am

Hey there!

I’m having problems to understand how alert system works with their state resolved / unresolved
I’m running Graylog 2.4.6 configured to receive syslog alerts from differents servers

I’ve created 2 alerts to match ssh authentications
The field on “message” and the value to be either “Accepted” or “failure” (easy match to trigger)
Grace period 0, backlog 1 line, and repeat notifications.

For some hosts, I receive a mail, for the others, nothing.
A quick search confirms me that Graylog received the log containing my ssh login, so this part is working.

I’ve seen on this topic that you need to repeat notifications AND a non matching alert to make it work, what is a non alert ?

Feedback appreciated!

Thanks,

jan · September 13, 2018, 9:07am

He @Krash

Graylog will run the search periodical (that is basically the alerting) - a non-alert refers to a search that does not have a result.

If you have a search alert run that does not return true the alert will be resolved.

Krash · September 13, 2018, 9:18am

Hey Jan,

So I have to create a new condition alert with nothing to match, in order to resolve my incidents ?

jan · September 13, 2018, 9:22am

no -

You search for the field content 1 in the field number - the alert will be active as long as 1 is found in the field in the time you specific for the alert. If 1 is not found the alert will be resolved.

Does this now make sence to you?`

Krash · September 13, 2018, 9:26am

Oh, yeah, that’s what i understood. but as soon as i received no mails for some hosts, i thought it came from here.

So yeah, i’ve configured alert to match text “Accepted” on field message. When it matches, alert goes to Unresolved, then next syslog line received (not matching, of course), alert goes to Resolved.

Sooooo how can I debug non received alerts ?

derPhlipsi · September 13, 2018, 9:34am

Heyo

I think you’re missing one little detail that you need to know to understand Graylog alerts.
Alerts trigger when at least one match was found. It does not trigger for each match individually.

This means: The alert conditions are checked every 60 seconds. If you have a search interval of 1 minute and there has been 4 hosts that match your search criteria in this timespan, Graylog will generate ONLY ONE alert for all 4 matches.

If you want an alert for each host, you’ll either have to define a stream, alert and notification for each host or you’ll have to increase the backlog size to accomodate all hosts and parse the hosts from that lists.

I think this is the little details that is missing here

Greetings,
Philipp

Krash · September 13, 2018, 9:41am

Oh yeah indeed, i was missing this information !
Thank you Jan and Phillip for your (very quick) support.
I don’t think that increasing the backlog size is a good idea for my situation
As creating as many streams as i have servers looks like a long job, any other suggestion ?
Won’t it make the solution slower ?
I’ve browsed the marketplace for plugins about this, nothing interesting found (or i missed it)

Again, thanks, I’ve spent the last 2 days digging in docs and forums, but have really missed that “last one match” thing

derPhlipsi · September 13, 2018, 9:43am

Well, your best bet would be to go to the Github Issues and submit a feature request asking for a new alert condition that triggers for every message.

Greetings,
Phil

Krash · September 13, 2018, 9:58am

Could it be a solution to change the search interval ? from 60s to (for example) 1s ?
Not sure the virtual machin has enough ressources tho’

derPhlipsi · September 13, 2018, 10:21am

Well, you could do that, but that’ll probably tax the machine so hard to make it crash.

Every alert condidtion would be checked every second, while they will likely take more than a second to finish.
Doing that is definetly not recommended.

jan · September 13, 2018, 10:50am

@Krash in theory that would work - but only if the search returns before the next search runs …

Currently the alerting is very limited, near future versions will improve that part of Graylog (but not 2.5 or 3.0) currently my advice would be to use your monitoring tool to run a check on a stream and alert with that tool on every message.

@derPhlipsi not only this might crash elasticsearch - in the end the result could be never triggerd notifications.

Grakkal · September 18, 2018, 7:10pm

@Krash What I’ve started doing is using some Pipelines to figure out what box a log came from, and then sort them into named Streams which then scan for the error and send the alert.
So, for instance, if a message comes in from e.g qa5.example.com, I have a pipeline which finds the number of the QA box and saves that into a field, then the next step looks at that field and sorts it to the QA5 Stream. In the QA5 Stream I have an alert which just looks for Level 3 errors and alerts on them.
This means if QA5, QA6, and QA7 all send errors at the same time, they all get individually alerted on.
I also have a Step 0 in this pipeline which is just a big list of messages I don’t want it to alert me about.

Krash · September 19, 2018, 11:28am

@Grakkal, that’s a good idea indeed, but i’ve started to work on @derPhlipsi solution

Thanks to APIs, i’m now able to create everything (own stream/alert/notification) for each host I need to have a check (each stream matching only 1 IP address)

The fact is… alert still does not work for some streams
I’ve tested on 28 hosts for now (ESXi), not working for 4 of them.

As I use the same API to create all my streams, that cannot be a coding error
As I see matching logs for these 4 hosts in their stream, i guess it’s not related to a misconfiguration on timezones

Any way to have a higher debug level in order to understand how this happends ?

derPhlipsi · September 19, 2018, 11:31am

Have a look at the System menu in your web-UI. You’ll find the submenu Logging, where you can set individual logging levels for each Graylog-subsystem

system · October 3, 2018, 11:31am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Question regarding alerts and monitoring Graylog Central (peer support)	11	762	November 27, 2019
Graylog Alerting Question Graylog Central (peer support)	7	1872	March 8, 2018
Alerts not being triggered for [ALL MSgs] Graylog Central (peer support)	7	2459	April 25, 2017
Alert conditions Graylog Central (peer support)	19	7794	April 18, 2017
Alerts not firing, alert filter not showing results Graylog Central (peer support) alert	4	181	October 26, 2023

Understanding alert system

Related topics