We are running:
Graylog 3.3.5+6436f1b in single node environment.
The problem we are facing is the fact that sometimes messages in our stream do not generate alert and therefore notification while it should.
Sometimes this happens after few days of everything working correctly.
Our setup:
We have an alert:
Type: Aggregation
Search Query: *
Streams: [Errors](/streams/5fd39b11cc99b90cb001d9b9/search)
Search within: 1 minutes
Execute search every: 1 minutes
Enable scheduling: yes
Group by Field(s): No Group by configured
Create Events if: count() **>** 0
Now I am executing code which successfully generates new messages in our âERRORâ stream but there is no corresponding Event generated for that message. The worst part is that sometimes this works and sometimes it does not. If I leave job putting message every 30 minutes to this stream through the night then in the morning I have around 5 new mails in my inbox instead of at least 10+ (2 messages per each hour).
Output of: echo âdb.processing_status.find()â | mongo graylog:
{ "_id" : ObjectId("603cebdb19c921775b573519"), "node_id" : "ec3cbb8d-0b71-4b8e-bc7d-53057bfa7406", "node_lifecycle_status" : "RUNNING", "updated_at" : ISODate("2021-03-01T15:18:39.988Z"), "receive_times" : { "ingest" : ISODate("2021-03-01T15:18:38.879Z"), "post_processing" : ISODate("2021-03-01T15:18:38.879Z"), "post_indexing" : ISODate("2021-03-01T15:18:38.879Z") }, "input_journal" : { "uncommitted_entries" : NumberLong(0), "read_messages_1m_rate" : 6.951131632689634, "written_messages_1m_rate" : 6.951131632689634, "journal_enabled" : true } }
After enabling âDEBUGâ log level via: PUT âhttp://127.0.0.1:9000/api/system/loggers /org.graylog.events.processor/level/debugâ
I can see such messages pooping all the time:
2021-03-01 15:20:34,825 DEBUG: org.graylog.events.processor.EventProcessorEngine - Executing event processor <At least one ERROR message/5fd39b11cc99b90cb001d9c3/aggregation-v1>
2021-03-01 15:20:34,832 DEBUG: org.graylog.events.processor.EventProcessorExecutionJob - Event processor <At least one ERROR message/5fd39b11cc99b90cb001d9c3> couldnât be executed because of a failed precondition (retry in 5000 ms)
Additionally in metrics for my node I can see such entries:
Is there anyone who could help me understand what is going on?