Alerts/Events do not trigger

Hello,

We just updated from 3.0.0-12 - 3.1.0-6 and now Alarm do not create any events - even tho the “Filter Preview” shows hits.

I removed all legacy Alarms and recreated them but still there is no Event Created.
Any pitfalls which i could check ?

The worst thing is that the alarms are still working in our staging system which has the same version as the Prod System the only difference is that the Staging System is just one node and the Prod System has 3.

edit: The only config difference between staging and prod is that because of the multi node setup the http_publish_uri is set to the specific host and the http_external_uri is set to the vip.

Regards
Jan

Hi,

we have the same problem as you have. Have you found any solution?

Heay

i still do not have any alarming. Do you also have a multi node setup ?

I think that it’s this bug: https://github.com/Graylog2/graylog2-server/issues/6415

Are all of you running a cluster node that takes no part in input processing?
Then this workaround should help:

We have updated our graylog server to 3.1.1-1 and do not receive alerts now
In logs we do not observe the reasons pointing to it
How can we find out why alerts are not working?

graylog-server v3.1.1-1
MongoDB v3.6.14
elasticsearch v5.6.16-1
OS: Red Hat Enterprise Linux Server 7.7

did you read the comment above by @mpfz0r ?

Yes we saw his comment
we use all nodes of our cluster
included logging and we see a lot of errors
ERROR [JobExecutionEngine] couldn’t handle trigger due to a permanent error 5d77797a359a8b6ba591fd7f - trigger won’t be retried
java.lang.IllegalStateException: couldn’t find job definition 5d7778001be7e958c829c1c9
at org.graylog.scheduler.JobExecutionEngine.lambda$handleTrigger$1(JobExecutionEngine.java:137) ~[graylog.jar:?]
at java.util.Optional.orElseThrow(Optional.java:290) ~[?:1.8.0_222]
at org.graylog.scheduler.JobExecutionEngine.handleTrigger(JobExecutionEngine.java:137) ~[graylog.jar:?]
at org.graylog.scheduler.JobExecutionEngine.lambda$execute$0(JobExecutionEngine.java:119) ~[graylog.jar:?]
at org.graylog.scheduler.worker.JobWorkerPool.lambda$execute$0(JobWorkerPool.java:110) ~[graylog.jar:?]
at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181) [graylog.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor$Worker.the run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
at java.lang.Thread.run(of the Thread.java:748) [?:1.8.0_222]
but we don’t know how to fix them

@jan happy cake day :smiley:

disabled journaling on all idle/standby nodes but still no alarming. Also a failover to a diferent node did nothing.

Regards
Jan

this seems to be a different problem then.
Do you mind sharing the following entries from your mongodb?

echo 'db.scheduler_job_definitions.find()' | mongo graylog
echo 'db.scheduler_triggers.find()' | mongo graylog
echo 'db.event_notifications.find()' | mongo graylog
echo 'db.event_definitions.find()' | mongo graylog
echo 'db.event_processor_state.find()' | mongo graylog
echo 'db.scheduler_triggers.find()' | mongo graylog
echo 'db.processing_status.find()' | mongo graylog

Thanks :slight_smile:

Just to make sure: you did restart the nodes after the config change?

If this doesn’t fix it, could you send me the same info I asked from @ec10?

Thanks

things are good with us right now.
we don’t know how it happened
thank you

Heay mpfz0r

all idle nodes got restarted after disabling the journal
There was a little bit more output so i uploaded it to Github Gist

Regards
Jan

Heay

I just installed graylog v3.1.1+b39ee32 there is no event created

Using

MongoDB v4.0.12
elasticsearch v6.8.3
OS: debian92

I disabled journaling by message_journal_enabled = false

And restart server and did not helped

I have one node and sending following gelf:

curl -XPOST http://myip:12201/gelf -p0 -d ‘{“short_message”:“Hello there”,"_Response_code":400, “host”:“example.org”, “facility”:“test”, “_foo”:“bar”}’

Hi,

I have just tried version 3.1.2 and also the same problem. No event created.

Graylog v3.1.2+9e96b08
Elasticsearch 6.8.2
MongoDB 4.0.12

Debian 10

One node.

I am using Graylog 3.1.2+9e96b08 with only 1 node which is not crazy active but we get a few messages per minute everytime. I do not get alerts at all, sometimes only if I restart graylog-server I might get 1 alarm, maybe. I found out on the metrics page for the node (on /system/metrics/node/{node}) that all my AggregationEventProcessors get into exceptions:

[org.graylog.events.processor.aggregation.AggregationEventProcessor.5d5a43d2eb7809e1f9a969e3.execution_count]
Counter
Value: 521

[org.graylog.events.processor.aggregation.AggregationEventProcessor.5d5a43d2eb7809e1f9a969e3.execution_exception]
Counter
Value: 521

I can see in the code (EventProcessorEngine class) that the exceptions are supposed to come in the server.log file if the DEBUG log level is activated. I have activated the DEBUG level but no logging comes from any of the event processor classes whatsoever.

Any advice?

Thank you.
so the events seem to be stuck since 2019-09-06T21:52:47
But I fail to see why :confused:

Could you get me a 5 minute debug log like I asked over here?

Thanks

Did you use the UI to turn on the debugging?
This doesn’t cover the event system (bug https://github.com/Graylog2/graylog2-server/pull/6423)

You can however use the API (https://github.com/Graylog2/graylog2-server/issues/6415#issuecomment-529845619)

It worked changing the logging level using the API. Now I can see the following in the log:

2019-09-16T11:38:58.979+02:00 DEBUG [EventProcessorExecutionJob] Event processor <Errors found/5d5a43d2eb7809e1f9a969e3> couldn’t be executed because of a failed precondition (retry in 5000 ms)

Looking in the code I can see that it can only come from AggregationEventProcessor class:100

if (!dependencyCheck.hasMessagesIndexedUpTo(parameters.timerange().getTo()))

That’s what I’m trying to track down. Do you mind sharing these parts of your configuration?