Understanding alert system


(Override) #1

Hey there!

I’m having problems to understand how alert system works with their state resolved / unresolved
I’m running Graylog 2.4.6 configured to receive syslog alerts from differents servers

I’ve created 2 alerts to match ssh authentications
The field on “message” and the value to be either “Accepted” or “failure” (easy match to trigger)
Grace period 0, backlog 1 line, and repeat notifications.

For some hosts, I receive a mail, for the others, nothing.
A quick search confirms me that Graylog received the log containing my ssh login, so this part is working.

I’ve seen on this topic that you need to repeat notifications AND a non matching alert to make it work, what is a non alert ?

Feedback appreciated!

Thanks,


(Jan Doberstein) #2

He @Krash

Graylog will run the search periodical (that is basically the alerting) - a non-alert refers to a search that does not have a result.

If you have a search alert run that does not return true the alert will be resolved.


(Override) #3

Hey Jan,

So I have to create a new condition alert with nothing to match, in order to resolve my incidents ?


(Jan Doberstein) #4

no -

You search for the field content 1 in the field number - the alert will be active as long as 1 is found in the field in the time you specific for the alert. If 1 is not found the alert will be resolved.

Does this now make sence to you?`


(Override) #5

Oh, yeah, that’s what i understood. but as soon as i received no mails for some hosts, i thought it came from here.

So yeah, i’ve configured alert to match text “Accepted” on field message. When it matches, alert goes to Unresolved, then next syslog line received (not matching, of course), alert goes to Resolved.

Sooooo how can I debug non received alerts ?


(Philipp Ruland) #6

Heyo :slight_smile:

I think you’re missing one little detail that you need to know to understand Graylog alerts.
Alerts trigger when at least one match was found. It does not trigger for each match individually.

This means: The alert conditions are checked every 60 seconds. If you have a search interval of 1 minute and there has been 4 hosts that match your search criteria in this timespan, Graylog will generate ONLY ONE alert for all 4 matches.

If you want an alert for each host, you’ll either have to define a stream, alert and notification for each host or you’ll have to increase the backlog size to accomodate all hosts and parse the hosts from that lists.

I think this is the little details that is missing here :slight_smile:

Greetings,
Philipp


(Override) #7

Oh yeah indeed, i was missing this information !
Thank you Jan and Phillip for your (very quick) support.
I don’t think that increasing the backlog size is a good idea for my situation
As creating as many streams as i have servers looks like a long job, any other suggestion ? :smiley:
Won’t it make the solution slower ?
I’ve browsed the marketplace for plugins about this, nothing interesting found (or i missed it)

Again, thanks, I’ve spent the last 2 days digging in docs and forums, but have really missed that “last one match” thing


(Philipp Ruland) #8

Well, your best bet would be to go to the Github Issues and submit a feature request asking for a new alert condition that triggers for every message.

Greetings,
Phil


(Override) #9

Could it be a solution to change the search interval ? from 60s to (for example) 1s ?
Not sure the virtual machin has enough ressources tho’ :angel:


(Philipp Ruland) #10

Well, you could do that, but that’ll probably tax the machine so hard to make it crash.

Every alert condidtion would be checked every second, while they will likely take more than a second to finish.
Doing that is definetly not recommended.


(Jan Doberstein) #11

@Krash in theory that would work - but only if the search returns before the next search runs …

Currently the alerting is very limited, near future versions will improve that part of Graylog (but not 2.5 or 3.0) currently my advice would be to use your monitoring tool to run a check on a stream and alert with that tool on every message.

@derPhlipsi not only this might crash elasticsearch - in the end the result could be never triggerd notifications.


(Roger Mier) #12

@Krash What I’ve started doing is using some Pipelines to figure out what box a log came from, and then sort them into named Streams which then scan for the error and send the alert.
So, for instance, if a message comes in from e.g qa5.example.com, I have a pipeline which finds the number of the QA box and saves that into a field, then the next step looks at that field and sorts it to the QA5 Stream. In the QA5 Stream I have an alert which just looks for Level 3 errors and alerts on them.
This means if QA5, QA6, and QA7 all send errors at the same time, they all get individually alerted on.
I also have a Step 0 in this pipeline which is just a big list of messages I don’t want it to alert me about.


(Override) #13

@Grakkal, that’s a good idea indeed, but i’ve started to work on @derPhlipsi solution

Thanks to APIs, i’m now able to create everything (own stream/alert/notification) for each host I need to have a check (each stream matching only 1 IP address)

The fact is… alert still does not work for some streams
I’ve tested on 28 hosts for now (ESXi), not working for 4 of them.

As I use the same API to create all my streams, that cannot be a coding error
As I see matching logs for these 4 hosts in their stream, i guess it’s not related to a misconfiguration on timezones

Any way to have a higher debug level in order to understand how this happends ?


(Philipp Ruland) #14

Have a look at the System menu in your web-UI. You’ll find the submenu Logging, where you can set individual logging levels for each Graylog-subsystem :slight_smile:


(system) #15

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.