We just updated from 3.0.0-12 - 3.1.0-6 and now Alarm do not create any events - even tho the “Filter Preview” shows hits.
I removed all legacy Alarms and recreated them but still there is no Event Created.
Any pitfalls which i could check ?
The worst thing is that the alarms are still working in our staging system which has the same version as the Prod System the only difference is that the Staging System is just one node and the Prod System has 3.
edit: The only config difference between staging and prod is that because of the multi node setup the http_publish_uri is set to the specific host and the http_external_uri is set to the vip.
We have updated our graylog server to 3.1.1-1 and do not receive alerts now
In logs we do not observe the reasons pointing to it
How can we find out why alerts are not working?
graylog-server v3.1.1-1
MongoDB v3.6.14
elasticsearch v5.6.16-1
OS: Red Hat Enterprise Linux Server 7.7
Yes we saw his comment
we use all nodes of our cluster
included logging and we see a lot of errors
ERROR [JobExecutionEngine] couldn’t handle trigger due to a permanent error 5d77797a359a8b6ba591fd7f - trigger won’t be retried
java.lang.IllegalStateException: couldn’t find job definition 5d7778001be7e958c829c1c9
at org.graylog.scheduler.JobExecutionEngine.lambda$handleTrigger$1(JobExecutionEngine.java:137) ~[graylog.jar:?]
at java.util.Optional.orElseThrow(Optional.java:290) ~[?:1.8.0_222]
at org.graylog.scheduler.JobExecutionEngine.handleTrigger(JobExecutionEngine.java:137) ~[graylog.jar:?]
at org.graylog.scheduler.JobExecutionEngine.lambda$execute$0(JobExecutionEngine.java:119) ~[graylog.jar:?]
at org.graylog.scheduler.worker.JobWorkerPool.lambda$execute$0(JobWorkerPool.java:110) ~[graylog.jar:?]
at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181) [graylog.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor$Worker.the run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
at java.lang.Thread.run(of the Thread.java:748) [?:1.8.0_222]
but we don’t know how to fix them
I am using Graylog 3.1.2+9e96b08 with only 1 node which is not crazy active but we get a few messages per minute everytime. I do not get alerts at all, sometimes only if I restart graylog-server I might get 1 alarm, maybe. I found out on the metrics page for the node (on /system/metrics/node/{node}) that all my AggregationEventProcessors get into exceptions:
I can see in the code (EventProcessorEngine class) that the exceptions are supposed to come in the server.log file if the DEBUG log level is activated. I have activated the DEBUG level but no logging comes from any of the event processor classes whatsoever.
It worked changing the logging level using the API. Now I can see the following in the log:
2019-09-16T11:38:58.979+02:00 DEBUG [EventProcessorExecutionJob] Event processor <Errors found/5d5a43d2eb7809e1f9a969e3> couldn’t be executed because of a failed precondition (retry in 5000 ms)
Looking in the code I can see that it can only come from AggregationEventProcessor class:100
if (!dependencyCheck.hasMessagesIndexedUpTo(parameters.timerange().getTo()))