Graylog suddenly stopped triggering alerts

hezor · April 5, 2017, 9:40am

Hello,

I’m not quite sure if I should post this here or create a GitHub issue, but let’s start here.
We have a following Graylog production setup running on CentOS 7.3 (all the components are installed from the yum repositories with Ansible):

3x Graylog 2.1.3 running on Java 1.8.0_92 with 8GB heap (12vCPU, 12GB RAM per virtual machine)
3x MongoDB 3.2.11 Replica Set
8x Elasticsearch 2.4.4

At April 3rd we faced an issue where Graylog suddenly stopped triggering all the alerts defined in different streams. We have been running this setup nearly a year (started with version 2.0.2) and nothing similar has ever happened before and we’ve been running version 2.1.3 since it was released. We ingest logs at an average speed of 800 messages per second and our current setup can handle the load without any visible problems.

The moment when the alerts stopped triggering, the following lines were output to the Graylog master-node server-log: https://pastebin.com/NMxQM3vZ

So it seems some kind of a connection error with MongoDB. Although the log files show that the connection to MongoDB was successfully re-established a moment later, the alerts did not start triggering until I manually restarted the Graylog master node.

So, do you have any insight about this? Could this be a bug in Graylog, or just a “glitch in the Matrix”? If there were some problems within the network (which can and will happen occasionally), I’m just wondering why Graylog did not start triggering alerts when the problems were resolved and connections to MongoDB replica set were re-established.

Thanks!

Br,
Henri

jan · April 5, 2017, 11:24am

Hej Henri,

did you have any plugins installed that provide additional notification?

If possible you might want to force this error by creating a network glitch in your setup and watch if that happen again. After that you can send us some information how to reproduce.

regards
Jan

hezor · April 5, 2017, 11:42am

Hey,

Do you mean alarm callback plugins? We use graylog-plugin-slack-2.4.0.jar and graylog-plugin-hipchat-1.3.0.jar to send alerts. And in addition, we use the default email callback.

Br,
Henri

jan · April 5, 2017, 1:48pm

Hej Henri,

did you check if they are compatible with the Graylog Version you are using?

Did you have the same issue if you remove those plugins?

regards
Jan

hezor · April 6, 2017, 10:57am

Hey,

At least they have been working fine ever since we upgraded to 2.1.3 when it was released. For now, I cannot reproduce this as this was a one-time issue (and hopefully will not recur). I was just wondering if the errors in the provided Graylog master-node server.log file would give you some input about this issue. But I think I’ll get back to you if this happens again. Thanks anyway!

Br,
Henri

Topic		Replies	Views
Alert events not trigger since graceful shutdown Graylog Central (peer support) alert	2	586	January 6, 2023
Stream alerts stop working after upgrading to cluster Graylog Central (peer support)	7	681	March 27, 2018
Alerts stopped triggering in 2.1.3 Graylog Central (peer support)	1	375	April 1, 2017
Notifications and events stopped firing completely Graylog Central (peer support)	1	932	October 28, 2020
No Event / Alert fired Graylog Central (peer support)	1	541	March 23, 2020

Graylog suddenly stopped triggering alerts

Related topics