Has a question about nofication delay, what should i do

Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!

1. Describe your incident:

Summary: Critical alert notification delay of 40–90 minutes.

Details: I am running a Graylog cluster with 6,000 alert definitions.My Graylog input throughput is 40k per minute. Each alert is configured to “Search within the last 1 minute” and “Execute search every 1 minute.”

Symptoms: * Notifications are delayed significantly (up to 90 minutes behind real-time).

2. Describe your environment:

  • OS Information: CentOS 7.9

  • Package Version: graylog-7.0.0-10, graylog-datanode-7.0.0-10, mongodb-7.0.25-1

  • Service logs, configurations, and environment variables:

  • Hardware:

    • 3x Graylog Server + MongoDB (16c/32G)

    • 1x Graylog Leader Server (16c/32G)

    • 3x Graylog DataNodes (16c/32G + SSD)

3. What steps have you already taken to try and solve the problem?

I tried to add “async_eventbus_processors = 16
outputbuffer_processors = 16
processbuffer_processors = 16” in server.config

4. How can the community help?

Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]

Hey @gnarly9203,

When looking under system/Node configuration, are the process/output buffers operating at close to 100%?

If your hosts have 16 cores available, then the processbuffer_processors and outputbuffer_processors settings should probably be better balanced to that number. For example:

outputbuffer_processors = 9
processbuffer_processors = 7

Thank you! The datanode I/O utilization is very low, with both graylog-server’s process and output buffers remaining normal and well below 10%. At First, i tried

outputbuffer_processors = 8
processbuffer_processors = 8, but the nofication still delay. So, i changed to 16.

By the way, I’m confident the system time is correct. Although I came across articles blaming time sync for similar problems, I’ve verified that our clocks are synchronized.

System clock differences across Graylog nodes could certainly cause this issue and it would be the most likely causation.

Are there any indicators with the Graylog logs, perhaps something losing the Graylog leader node or communication timeouts to Mongo?

Within the mongosh context on your mongo primary, then when viewing the graylog DB, if you run the below what do the post_processing times for each node say?

db.processing_status.find().toArray()

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.