Wrong calculation of "Next execution" in Graylog Alerts

quocbao · December 9, 2022, 11:21am

1. My problem

Hi everyone, I have some troubles with the “Next execution” of Graylog Alerts. The “Next execution” is not calculated as configured

2. My system

OS Information: CentOS 7
Package Version: 4.3.5+32fa802
MongoDB v4.2.18
OpenSearch v1.3.5

3. What steps have you already taken to try and solve the problem?

Create new alerts (not work)
Disable and re-enable alerts (not work)
Update an existing alert (not work)

Suspected error log

4. How can the community help?

Show me how to debug or fix the issue.

Thank you,

joe.gross · December 9, 2022, 7:09pm

Do you have any processing bottlenecks? Check System/Nodes > Details and see if any of the buffers are full. If processing is backed up, it can cause alerts to back up as well. Might explain the odd Next Timerange result.

quocbao · December 10, 2022, 4:19am

All buffers are almost empty

By the way, some alerts only have several seconds different between Last and Next

gsmith · December 10, 2022, 4:33am

@quocbao
Hey i was looking over this, what I noticed was the Next timerange: is a day behind, by chance did you check the Timezone on this server? And do you have NTP installed this server?
Meaning do these line up?

System/Overview -->Time configuration

EDIT: Did this issue just start? if so, what was done prior to this issue. Update/Upgrades, etc…

quocbao · December 11, 2022, 3:10am

Hi @gsmith,

Thanks for your reply. Here is my time configuration

Time drift on the MongoDB server (single node)

Time drift on Graylog servers.

The Graylog cluster has run well for months. This issue seems to be happening after an incident with our MongoDB incident several days ago. I have to use “kill -9”. Nothing in error logs of MongoDB related to the memory issue or other errors.

Thanks for your attention.

gsmith · December 13, 2022, 12:07am

Oh I see. So what ever happened with Mongodb now you having issues.

Have you try dumping graylog database and rebuild?
Make sure it’s clear, execute mongodump then Reinstall mongodb then upload graylog database back in.

quocbao · December 13, 2022, 7:23am

Hi @gsmith,

I dumped and restored all MongoDB collections to a new MongoDB 4.4 instance but the error kept happening.

At the same time, I found this

I guess db.getCollection('scheduler_triggers').find({"status": "runnable"}).count() can not bigger than db.getCollection('event_definitions').find({}).count().

Is there any mapping between event_definitions and scheduler_triggers so I can clean this mess?

quocbao · December 13, 2022, 7:50am

I use this query to find suspicious documents

db.getCollection("scheduler_triggers").aggregate(
    [
        {
            "$group" : {
                "_id" : {
                    "job_definition_id" : "$job_definition_id"
                },
                "count" : {
                    "$sum" : NumberInt(1)
                }
            }
        }, 
        {
            "$project" : {
                "job_definition_id" : "$_id.job_definition_id",
                "count" : "$count",
                "_id" : NumberInt(0)
            }
        }, 
        {
            "$sort" : {
                "count" : NumberInt(-1)
            }
        }
    ], 
    {
        "allowDiskUse" : true
    }
);

I ran delete

> use graylog
switched to db graylog
> db.getCollection("scheduler_triggers").deleteMany({"$and": [{"job_definition_id":"6205e13fd0632503c3f052cc"}, {"triggered_at": null}]})
{ "acknowledged" : true, "deletedCount" : 488872 }
>

Now Next execution of new events seems to be corrected

Still have problems with Next timerange of old events

I tried disabling the event and then re-enable it again, and … it works !!!

gsmith · December 13, 2022, 10:11pm

hey,
Oh wow,
This is kind strange, I wonder what made mongo do this.

gsmith · February 15, 2023, 10:07pm

@quocbao

You should be bale to repost here is you want.

Topic		Replies	Views
Post update event scheduler is not working Graylog Central (peer support)	13	582	December 21, 2022
Graylog suddenly stopped triggering alerts Graylog Central (peer support)	4	861	April 6, 2017
Alerts - Notifications Graylog Central (peer support)	5	1316	October 30, 2020
Alert Filter search once a day Graylog Central (peer support)	13	1110	December 15, 2020
Alerts/Events do not trigger Graylog Central (peer support)	54	8158	October 22, 2019

Wrong calculation of "Next execution" in Graylog Alerts

Related topics