Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question. Don’t forget to select tags to help index your topic!
1. Describe your incident:
Summary: Critical alert notification delay of 40–90 minutes.
Details: I am running a Graylog cluster with 6,000 alert definitions.My Graylog input throughput is 40k per minute. Each alert is configured to “Search within the last 1 minute” and “Execute search every 1 minute.”
Symptoms: * Notifications are delayed significantly (up to 90 minutes behind real-time).
When looking under system/Node configuration, are the process/output buffers operating at close to 100%?
If your hosts have 16 cores available, then the processbuffer_processors and outputbuffer_processors settings should probably be better balanced to that number. For example:
Thank you! The datanode I/O utilization is very low, with both graylog-server’s process and output buffers remaining normal and well below 10%. At First, i tried
outputbuffer_processors = 8
processbuffer_processors = 8, but the nofication still delay. So, i changed to 16.
By the way, I’m confident the system time is correct. Although I came across articles blaming time sync for similar problems, I’ve verified that our clocks are synchronized.
System clock differences across Graylog nodes could certainly cause this issue and it would be the most likely causation.
Are there any indicators with the Graylog logs, perhaps something losing the Graylog leader node or communication timeouts to Mongo?
Within the mongosh context on your mongo primary, then when viewing the graylog DB, if you run the below what do the post_processing times for each node say?