Do you have any processing bottlenecks? Check System/Nodes > Details and see if any of the buffers are full. If processing is backed up, it can cause alerts to back up as well. Might explain the odd Next Timerange result.
@quocbao
Hey i was looking over this, what I noticed was the Next timerange: is a day behind, by chance did you check the Timezone on this server? And do you have NTP installed this server?
Meaning do these line up?
System/Overview -->Time configuration
EDIT: Did this issue just start? if so, what was done prior to this issue. Update/Upgrades, etc…
The Graylog cluster has run well for months. This issue seems to be happening after an incident with our MongoDB incident several days ago. I have to use “kill -9”. Nothing in error logs of MongoDB related to the memory issue or other errors.
Oh I see. So what ever happened with Mongodb now you having issues.
Have you try dumping graylog database and rebuild?
Make sure it’s clear, execute mongodump then Reinstall mongodb then upload graylog database back in.
I guess db.getCollection('scheduler_triggers').find({"status": "runnable"}).count() can not bigger than db.getCollection('event_definitions').find({}).count().
Is there any mapping between event_definitions and scheduler_triggers so I can clean this mess?