Error: "Event processor failed to execute" after upgrade graylog to 4.2.1

Hello. I hope you can help me =) Thank you in advance.
Do not judge strictly, this is my first work with a graylog. all setup was done by the previous admin
I have a graylog cluster:
nginx proxy;
3 graylog nodes (version: 4.2.1-1);
1 elasticsearch coordinator node (version: 6.8.12-1);
3 elasticsearch master nodes (version: 6.8.7-1);
4 elasticsearch workers nodes (version: 6.8.8-1);
3 mongodb nodes (version: 3.2.22);
After successfully upgrading graylog to 4.2.1. (Everything seems to work fine except for the problem below) I have notes that there are errors in the graylog logs:

graylog logs:

2021-11-30T18:25:15.835+03:00 ERROR [EventProcessorExecutionJob] Event processor <aggregation-v1/5f621978416c95371910eb4f> failed to execute: Couldn't create events for: EventDefinitionDto{id=5f621978416c95371910eb4f, title=(600) Private Cloud: unable to execute cron trigger - user not found, description=The Mistral may have orphaned cron triggers to execute due to the employee's dismissal., priority=2, alert=true, config=AggregationEventProcessorConfig{type=aggregation-v1, query=source:mistral-public*cloud.nexign.com AND message:"ERROR mistral.services.periodic" AND message:"Could not find user", queryParameters=[], streams=[], groupBy=[], series=[], conditions=Optional[AggregationConditions{expression=Optional.empty}], searchWithinMs=3600000, executeEveryMs=3600000}, fieldSpec={}, keySpec=[], notificationSettings=EventNotificationSettings{gracePeriodMs=86400000, backlogSize=1}, notifications=[Config{notificationId=5e67678ae33c1b4032e70764, notificationParameters=Optional.empty}], storage=[Config{type=persist-to-streams-v1, streams=[000000000000000000000002]}]} (retry in 5000 ms)
org.graylog.events.processor.EventProcessorException: Couldn't create events for: EventDefinitionDto{id=5f621978416c95371910eb4f, title=(600) Private Cloud: unable to execute cron trigger - user not found, description=The Mistral may have orphaned cron triggers to execute due to the employee's dismissal., priority=2, alert=true, config=AggregationEventProcessorConfig{type=aggregation-v1, query=source:mistral-public*cloud.nexign.com AND message:"ERROR mistral.services.periodic" AND message:"Could not find user", queryParameters=[], streams=[], groupBy=[], series=[], conditions=Optional[AggregationConditions{expression=Optional.empty}], searchWithinMs=3600000, executeEveryMs=3600000}, fieldSpec={}, keySpec=[], notificationSettings=EventNotificationSettings{gracePeriodMs=86400000, backlogSize=1}, notifications=[Config{notificationId=5e67678ae33c1b4032e70764, notificationParameters=Optional.empty}], storage=[Config{type=persist-to-streams-v1, streams=[000000000000000000000002]}]}
        at org.graylog.events.processor.EventProcessorEngine.execute(EventProcessorEngine.java:106) ~[graylog.jar:?]
        at org.graylog.events.processor.EventProcessorExecutionJob.execute(EventProcessorExecutionJob.java:115) ~[graylog.jar:?]
        at org.graylog.scheduler.JobExecutionEngine.executeJob(JobExecutionEngine.java:166) ~[graylog.jar:?]
        at org.graylog.scheduler.JobExecutionEngine.lambda$handleTrigger$2(JobExecutionEngine.java:144) ~[graylog.jar:?]
        at com.codahale.metrics.Timer.time(Timer.java:151) ~[graylog.jar:?]
        at org.graylog.scheduler.JobExecutionEngine.handleTrigger(JobExecutionEngine.java:144) ~[graylog.jar:?]
        at org.graylog.scheduler.JobExecutionEngine.lambda$execute$0(JobExecutionEngine.java:119) ~[graylog.jar:?]
        at org.graylog.scheduler.worker.JobWorkerPool.lambda$execute$0(JobWorkerPool.java:110) ~[graylog.jar:?]
        at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180) [graylog.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_272]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_272]
        at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_272]
Caused by: org.graylog2.indexer.IndexNotFoundException: Unable to scroll indices.

Index not found for query: critical_213. Try recalculating your index ranges.
        at org.graylog2.indexer.IndexNotFoundException.create(IndexNotFoundException.java:32) ~[graylog.jar:?]
        at org.graylog.storage.elasticsearch6.jest.JestUtils.specificException(JestUtils.java:114) ~[?:?]
        at org.graylog.storage.elasticsearch6.jest.JestUtils.execute(JestUtils.java:72) ~[?:?]
        at org.graylog.storage.elasticsearch6.jest.JestUtils.execute(JestUtils.java:77) ~[?:?]
        at org.graylog.storage.elasticsearch6.Scroll.scroll(Scroll.java:45) ~[?:?]
        at org.graylog.storage.elasticsearch6.Scroll.scroll(Scroll.java:41) ~[?:?]
        at org.graylog.storage.elasticsearch6.MoreSearchAdapterES6.scrollEvents(MoreSearchAdapterES6.java:163) ~[?:?]
        at org.graylog.events.search.MoreSearch.scrollQuery(MoreSearch.java:147) ~[graylog.jar:?]
        at org.graylog.events.processor.aggregation.AggregationEventProcessor.filterSearch(AggregationEventProcessor.java:230) ~[graylog.jar:?]
        at org.graylog.events.processor.aggregation.AggregationEventProcessor.createEvents(AggregationEventProcessor.java:125) ~[graylog.jar:?]
        at org.graylog.events.processor.EventProcessorEngine.execute(EventProcessorEngine.java:92) ~[graylog.jar:?]
        ... 12 more

I tried re-indexing the indices, it didn’t help.
and there are no such problems in google either

Index not found for query: critical_213. Try recalculating your index ranges

interestingly, such an index does not exist at all

mongodb logs:

[root@srv-log-mnode01 ~]# tail  /var/log/mongodb/mongod.log
2021-11-30T18:26:42.859+0300 I ACCESS   [conn482218] Successfully authenticated as principal __system on local
2021-11-30T18:28:05.929+0300 I NETWORK  [conn482213] end connection 172.30.5.217:45248 (30 connections now open)
2021-11-30T18:28:23.268+0300 I NETWORK  [initandlisten] connection accepted from 172.30.5.217:45284 #482219 (31 connections now open)
2021-11-30T18:28:23.269+0300 I ACCESS   [conn482219] Successfully authenticated as principal __system on local
2021-11-30T18:29:23.269+0300 I NETWORK  [conn482219] end connection 172.30.5.217:45284 (30 connections now open)
2021-11-30T18:31:30.670+0300 I NETWORK  [initandlisten] connection accepted from 172.30.5.217:45292 #482220 (31 connections now open)
2021-11-30T18:31:30.670+0300 I ACCESS   [conn482220] Successfully authenticated as principal __system on local
2021-11-30T18:33:25.281+0300 I NETWORK  [conn482218] end connection 172.30.5.217:45278 (30 connections now open)
2021-11-30T18:34:38.051+0300 I NETWORK  [initandlisten] connection accepted from 172.30.5.217:45300 #482221 (31 connections now open)
2021-11-30T18:34:38.052+0300 I ACCESS   [conn482221] Successfully authenticated as principal __system on local

Hello && Welcome

I might be able to help. What I did was break down your logs shown above to get better clarity on what may be happening.

Event processor <aggregation-v1/5f621978416c95371910eb4f> failed to execute:
Private Cloud: unable to execute cron trigger - user not found
description=The Mistral may have orphaned cron triggers to execute due to the employee's dismissal., priority=2, alert=true,
query=source:mistral-public*cloud.nexign.com AND message:"ERROR mistral.services.periodic"
The Mistral may have orphaned cron triggers to execute due to the employee's dismissal.

By chance do you have an owner/user on event EventDefinitionDto{id=5f621978416c95371910eb4f} or stream?

You can find the EventDefinition issue in the URL.

image

Caused by: org.graylog2.indexer.IndexNotFoundException: Unable to scroll indices.
Index not found for query: critical_213

Perhaps execute a couple test to find out what’s going on with Elasticsearch. Since you have a few ES nodes might want to check out all of them so the “localhost” may very by IP address.

Elasticsearch health.

curl -XGET http://localhost:9200/_cluster/health?pretty=true

Elasticsearch’ cat shards API will tell you which shards are unassigned, and why:

curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED

This checks all you nodes in the cluster.

curl -XGET http://localhost:9200/_nodes?pretty

Did you check all Elasticsearch nodes status?

systemctl status elasticsearch

Do you see anything that may pertain to this issue in Elasticsearch logs. I believe there located in /var/log/elasticsearch?

Since your using a proxy do you see anything that also may pertain to this issue in nginx log files?

If you navigate to System/Overview and scroll down to “Show Errors” button. Click on it, what do you see?

Have you tried to Recalculate you indices and TAIL you Graylog files?

critical_213 Index set so I assume your index prefix would be critical ? So you don’t see that prefix in any of your indices?

Example where to look. I did a mockup for you

Hope that helps

Highsmith. Thank you for detailed response.
I’ve tried to do what you said. I’ve checked status of Elasticsearch as you said (everything was fine).

If you navigate to System/Overview and scroll down to “Show Errors” button. Click on it, what do you see?

Yes. There was some kind of error similar to the error from the logs

"The Graylog server encountered an error while trying to send an email. This is the detailed error message: IndexNotFoundException{message=Unable to scroll indices. Index not found for query: critical_213. Try recalculating your index ranges., errorDetails=[Index not found for query: critical_213. Try recalculating your index ranges.]}"

critical_213 Index set so I assume your index prefix would be critical ? So you don’t see that prefix in any of your indices?

Yes, I do not have this index. But now, I think the administrator before me deleted this index manually, and GrayLog decided that this index still exists. Everything had worked fine until we rebooted the server (did upgrade to 4.2.1).
I found this (Could not execute search(Index not found for query)) thread and it helped me. I removed unneeded (critical_*) entries from MongoDB and after that Graylog works fine.

Thanks again. =)

1 Like

Oh yeah, that’s now good.

Sorry I could not be more precise solution but I hope I gave you a couple ideas.
If you could mark this a solved. This would help for future search’s.

Glad its resolved :smiley: and thank you for posting your solution.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.