Alerts/Events do not trigger

meerkampdvv · September 2, 2019, 10:40am

Hello,

We just updated from 3.0.0-12 - 3.1.0-6 and now Alarm do not create any events - even tho the “Filter Preview” shows hits.

I removed all legacy Alarms and recreated them but still there is no Event Created.
Any pitfalls which i could check ?

The worst thing is that the alarms are still working in our staging system which has the same version as the Prod System the only difference is that the Staging System is just one node and the Prod System has 3.

edit: The only config difference between staging and prod is that because of the multi node setup the http_publish_uri is set to the specific host and the http_external_uri is set to the vip.

Regards
Jan

saturn86 · September 6, 2019, 5:00pm

Hi,

we have the same problem as you have. Have you found any solution?

meerkampdvv · September 9, 2019, 8:40am

Heay

i still do not have any alarming. Do you also have a multi node setup ?

saturn86 · September 9, 2019, 8:57am

I think that it’s this bug: https://github.com/Graylog2/graylog2-server/issues/6415

mpfz0r · September 11, 2019, 7:26am

Are all of you running a cluster node that takes no part in input processing?
Then this workaround should help:

ec10 · September 11, 2019, 7:21pm

We have updated our graylog server to 3.1.1-1 and do not receive alerts now
In logs we do not observe the reasons pointing to it
How can we find out why alerts are not working?

graylog-server v3.1.1-1
MongoDB v3.6.14
elasticsearch v5.6.16-1
OS: Red Hat Enterprise Linux Server 7.7

jan · September 11, 2019, 7:38pm

did you read the comment above by @mpfz0r ?

ec10 · September 12, 2019, 4:20am

Yes we saw his comment
we use all nodes of our cluster
included logging and we see a lot of errors
ERROR [JobExecutionEngine] couldn’t handle trigger due to a permanent error 5d77797a359a8b6ba591fd7f - trigger won’t be retried
java.lang.IllegalStateException: couldn’t find job definition 5d7778001be7e958c829c1c9
at org.graylog.scheduler.JobExecutionEngine.lambda$handleTrigger$1(JobExecutionEngine.java:137) ~[graylog.jar:?]
at java.util.Optional.orElseThrow(Optional.java:290) ~[?:1.8.0_222]
at org.graylog.scheduler.JobExecutionEngine.handleTrigger(JobExecutionEngine.java:137) ~[graylog.jar:?]
at org.graylog.scheduler.JobExecutionEngine.lambda$execute$0(JobExecutionEngine.java:119) ~[graylog.jar:?]
at org.graylog.scheduler.worker.JobWorkerPool.lambda$execute$0(JobWorkerPool.java:110) ~[graylog.jar:?]
at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181) [graylog.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor$Worker.the run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
at java.lang.Thread.run(of the Thread.java:748) [?:1.8.0_222]
but we don’t know how to fix them

meerkampdvv · September 12, 2019, 5:18am

@jan happy cake day

disabled journaling on all idle/standby nodes but still no alarming. Also a failover to a diferent node did nothing.

Regards
Jan

mpfz0r · September 12, 2019, 11:12am

this seems to be a different problem then.
Do you mind sharing the following entries from your mongodb?

echo 'db.scheduler_job_definitions.find()' | mongo graylog
echo 'db.scheduler_triggers.find()' | mongo graylog
echo 'db.event_notifications.find()' | mongo graylog
echo 'db.event_definitions.find()' | mongo graylog
echo 'db.event_processor_state.find()' | mongo graylog
echo 'db.scheduler_triggers.find()' | mongo graylog
echo 'db.processing_status.find()' | mongo graylog

Thanks

mpfz0r · September 12, 2019, 11:14am

Just to make sure: you did restart the nodes after the config change?

If this doesn’t fix it, could you send me the same info I asked from @ec10?

Thanks

ec10 · September 12, 2019, 11:21am

things are good with us right now.
we don’t know how it happened
thank you

meerkampdvv · September 12, 2019, 12:43pm

Heay mpfz0r

all idle nodes got restarted after disabling the journal
There was a little bit more output so i uploaded it to Github Gist

Regards
Jan

hashed · September 13, 2019, 10:22am

Heay

I just installed graylog v3.1.1+b39ee32 there is no event created

Using

MongoDB v4.0.12
elasticsearch v6.8.3
OS: debian92

I disabled journaling by message_journal_enabled = false

And restart server and did not helped

I have one node and sending following gelf:

curl -XPOST http://myip:12201/gelf -p0 -d ‘{“short_message”:“Hello there”,"_Response_code":400, “host”:“example.org”, “facility”:“test”, “_foo”:“bar”}’

saturn86 · September 13, 2019, 10:37am

Hi,

I have just tried version 3.1.2 and also the same problem. No event created.

Graylog v3.1.2+9e96b08
Elasticsearch 6.8.2
MongoDB 4.0.12

Debian 10

One node.

Mastertrap21 · September 13, 2019, 10:37am

I am using Graylog 3.1.2+9e96b08 with only 1 node which is not crazy active but we get a few messages per minute everytime. I do not get alerts at all, sometimes only if I restart graylog-server I might get 1 alarm, maybe. I found out on the metrics page for the node (on /system/metrics/node/{node}) that all my AggregationEventProcessors get into exceptions:

[org.graylog.events.processor.aggregation.AggregationEventProcessor.5d5a43d2eb7809e1f9a969e3.execution_count]
Counter
Value: 521

[org.graylog.events.processor.aggregation.AggregationEventProcessor.5d5a43d2eb7809e1f9a969e3.execution_exception]
Counter
Value: 521

I can see in the code (EventProcessorEngine class) that the exceptions are supposed to come in the server.log file if the DEBUG log level is activated. I have activated the DEBUG level but no logging comes from any of the event processor classes whatsoever.

Any advice?

mpfz0r · September 16, 2019, 9:26am

Thank you.
so the events seem to be stuck since 2019-09-06T21:52:47
But I fail to see why

Could you get me a 5 minute debug log like I asked over here?

Thanks

mpfz0r · September 16, 2019, 9:32am

Did you use the UI to turn on the debugging?
This doesn’t cover the event system (bug Set the loglevel for org.graylog2 AND org.graylog by mpfz0r · Pull Request #6423 · Graylog2/graylog2-server · GitHub)

You can however use the API (Alerting not working if cluster contains nodes with no active inputs · Issue #6415 · Graylog2/graylog2-server · GitHub)

Mastertrap21 · September 16, 2019, 9:48am

It worked changing the logging level using the API. Now I can see the following in the log:

2019-09-16T11:38:58.979+02:00 DEBUG [EventProcessorExecutionJob] Event processor <Errors found/5d5a43d2eb7809e1f9a969e3> couldn’t be executed because of a failed precondition (retry in 5000 ms)

Looking in the code I can see that it can only come from AggregationEventProcessor class:100

if (!dependencyCheck.hasMessagesIndexedUpTo(parameters.timerange().getTo()))

mpfz0r · September 16, 2019, 9:54am

That’s what I’m trying to track down. Do you mind sharing these parts of your configuration?

Topic		Replies	Views
No Event / Alert fired Graylog Central (peer support)	1	541	March 23, 2020
Alerts stopped triggering in 2.1.3 Graylog Central (peer support)	1	375	April 1, 2017
Alert events not trigger since graceful shutdown Graylog Central (peer support) alert	2	585	January 6, 2023
Alerts not working in Graylog 3.1.4 Graylog Central (peer support)	6	125	March 27, 2024
No alert is created in graylog 5.0.5 Graylog Central (peer support) alert	1	3	June 5, 2025

Alerts/Events do not trigger

Related topics