Post update event scheduler is not working

Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!

1. Describe your incident:
Event scheduler not working post graylog update.

Events now show
Status:
runnable
Next execution:
2022-12-12 15:38:11.064 (A few mins in the past)

Im guessing with a date in the past its never going to trigger.
Notifications work, everything that i can see works. Switched logging to debug and disabled and enabled events but to no avail.

ubuntu@ip-10-60-40-12:~$ cat /etc/graylog/server/server.conf | egrep -v “^\s*(#|$)”
is_master = true
node_id_file = /etc/graylog/server/node-id
password_secret =

root_password_sha2 =

root_timezone = Etc/GMT-1
bin_dir = /usr/share/graylog-server/bin
data_dir = /var/lib/graylog-server
plugin_dir = /usr/share/graylog-server/plugin
http_bind_address = 0.0.0.0:9000
http_external_uri = https://xxxxxx.com/
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = false
elasticsearch_analyzer = standard
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 5
outputbuffer_processors = 3
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://grayuser:i-xxxx@127.0.0.1:27017/graylog
mongodb_uri = mongodb://grayuser:i-xxxx@127.0.0.1:27017/graylog
mongodb_uri = mongodb://grayuser:xxxx@127.0.0.1:27017/graylog
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
proxied_requests_thread_pool_size = 32
prometheus_exporter_enabled = true
prometheus_exporter_bind_address = xxxx:9833

2. Describe your environment:

  • OS Information:

  • Package Version:

  • Service logs, configurations, and environment variables:
    Stand alone ubuntu server running
    Version:
    4.2.13+9c90b93, codename Noir
    JVM:
    PID 1101, Ubuntu 11.0.17 on Linux 5.4.0-1092-aws
    Time:
    2022-12-13 08:18:56 +00:00

3. What steps have you already taken to try and solve the problem?
we did do a snapshot but as this wasnt noticed we dont want to revert we want to fix forward.

we may update again to next version if a bug

4. How can the community help?

Seen something similar here but no indication of fix - Alert/Event not firing

Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]


filter seems to work but still no alerts.

We thought it might be a time issue but doesnt seem to be as we fixed the ntp issue on the server.

image
I get a notificatio that ive updated so i guess this has been written to db

When you say Event Scheduler, you mean Alerts…

I don’t think you want the quotes in the search query - I believe that changes it from checking a field to looking for that string…

Hello,

By chance does this setting work? 3 Mongo nodes with the same IP?

if it a replica set maybe something like this

mongodb_uri = mongodb://grayloguser:secret@mongo_node01:27017,mongo_node02:27018,mongo_node03:27019/graylog?replicaSet=rs01

Hi
what would that do, set up the same user on different ports?

I dont think this is an issue as theres only one db

Yes this does work in quotes, as we are looking for that specific string to alert on.

So the way i think it works -
When a new Event is made or modified, details are written to the mongoDB and a schedule is automatically made to trigger the check on the DB, when a match happens this creates the alert. Our alert seem to be written to the db but the internal schedule is not triggered.
Hence the last exicution message and no next execution message
image
another older unmodified alert-
image

The sting in quotes is working as we see a result given back onscreen in the filter preview. I think if that wasnt working then it maybe a case of no matches and no alerts.

Let me be clear no alerts are working. older events and newly created ones since the update

Also just thought id mention that we get a lot of old notifications on reboot.

I found a very similar issue here - Alerting not working if cluster contains nodes with no active inputs · Issue #6415 · Graylog2/graylog2-server · GitHub


id like to see the output from here:

but i get this output -
image

I think i may have gotten something working

2022-12-15T09:57:27.037Z INFO [DiagnosticEventLogger] Current thread pool executor state: ExecutorStateEvent(executorName=SchedulerThreadPoolExecutor, currentQueueSize=0, activeThreads=0, coreThreads=0, leasesOwned=1, largestPoolSize=2, maximumPoolSize=2147483647)
2022-12-15T09:57:33.050Z INFO [Scheduler] Current stream shard assignments: shardId-000000000000
2022-12-15T09:57:33.050Z INFO [Scheduler] Sleeping …
2022-12-15T09:57:40.422Z INFO [DiagnosticEventLogger] Current thread pool executor state: ExecutorStateEvent(executorName=SchedulerThreadPoolExecutor, currentQueueSize=0, activeThreads=0, coreThreads=0, leasesOwned=1, largestPoolSize=2, maximumPoolSize=2147483647)
2022-12-15T09:57:42.042Z INFO [Scheduler] Current stream shard assignments: shardId-000000000000
2022-12-15T09:57:42.042Z INFO [Scheduler] Sleeping …
2022-12-15T09:57:44.049Z INFO [Scheduler] Current stream shard assignments: shardId-000000000000
2022-12-15T09:57:44.049Z INFO [Scheduler] Sleeping …

IF no alerts are working Are you sure the Notification that is attached to the Alert Event is working? What kind of Notification are you using?

For @gsmith’s point, the three instances of defining mongodb_uri likely would only take the last one defined, the previous value is usually overwritten when you define something more than once…

For the quoted search where you are looking for a snippet in the full message… yes that works… it’s just not efficient. In the example you have given, you are asking Graylog to search through all full messages that have come in for the past 28 hours for “Response Code: 96” … depending on the number of messages over that time, this could be a very expensive search. Graylog is designed so that when the message comes in, you can use extractors and/or the pipeline to break the full message to it’s constituent parts and it would allow for a way more efficient search… <find all response_code fields that have a value of 96 in the past 28 hours>. My initial through it that it failed the search or took to long since you were searching every minute through so much.

Hi Thanks for that.
So we continued to troubleshoot and restored a snapshot to another instance. There must have been a crash before updates as on the pre update snapshot was also broken. We did the same with a 7day earlier snap and all is working. Notifications events the lot.

The 28hr time frame was purely to trigger the event as thats when it had last aoccured in logs.

I think that OOM killer killed the Graylog process and something was damaged. we will try a restore with older snap to a bigger intance - more mem resources.

That sucks to have to go back far! Grrr… Good luck !

1 Like

OK so we figured out the issue to some extent after our OG snapshot graylog ran for a about 12hours and also got the same issue.

We had an ongoing issue that triggered 20k logs and an alert that was triggered and tried to also to give us 20k notifications. The event scheduler broke well before that.

we have disabled the events that match the issue until our devs can address the issue and after disabling and rebooting the server events began to work again.

1 Like

yow! Glad you found it!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.