Event/Alert Aggregation count() not correct anymore

1. Describe your incident:
Since the last update of Graylog-Server and OpenSearch (done on 08.08.24) my Event Definitions based on the aggregation count() are not working correctly anymore. Example Event:
Type
Aggregation

Search Query
some search query

Search Filters
No filters configured

Streams
Stream XYZ

Search within
5 minutes

Execute search every
5 minutes

Enable scheduling
yes

Group by Field(s)
No Group by configured

Create Events if
count()>=15

Now, since the mentioned update, the event is matching every time as configured per scheduling (in this case every 5 minutes) with count() values that aren´t correct. For example the event is created because of “count()=44.0”, when I click “Replay search” it shows exactly 1 message, therefore no event should have been created for this. The wrong count value is changing every time the event is matching…

Because both Graylog and OpenSearch were updated, I don´t know which component is causing this issue…

Does anyone have an idea how to solve or further debug this?

2. Describe your environment:

  • OS Information: AlmaLinux 9.4

  • Package Version:
    graylog-server-6.0.5-1
    opensearch-2.16.0-1
    mongodb-org-server-7.0.12-1

3. What steps have you already taken to try and solve the problem?
Restarted services and rebooting server

4. How can the community help?
Hopefully someone had the same problem and was able to solve it

Hello @bjwe,

Any chance you could share a screenshot of what you see upon hitting replay?

If you were to search yourself for for the field that the count is based on within the timeframe is the count incorrect?

Hello @Wine_Merchant ,
thank you for your reply!
Here you can see the screenshot, first one from the Event overview which shows a count of 24:


Second screenshot from hitting replay which shows that only 1 message appeared during the searched time will follow in the next post (I can only upload 1 in this post somehow…)

To your second question: If I search by myself inside the stream and timeframe, the count is the same as shown on the second screenshot (count 1), so the count of 24 from the first screenshot is incorrect, yes.

@Wine_Merchant
Screenshot 2:

This is an odd one, what of Graylog/Mongo/OS were you on what did you move to?

Assuming this event was carried across from before the update, If you create a new Event Definition with the same configuration does it yield the same results?

Sounds like this has to do with an issue we recently discovered with OpenSearch 2.16 (see Opensearch 2.16.0 breaks alerts · Issue #20119 · Graylog2/graylog2-server · GitHub). We currently do not recommend using this version because of this issue, but OpenSearch is already addressing it and we expect a fix soon. Sorry that you are experiencing the issue!

1 Like

@Wine_Merchant
Mongo and OS weren´t updated since the problem started.
Problem started after:
graylog-server-6.0.4-1.x86_64 → graylog-server-6.0.5-1.x86_64
opensearch-2.15.0-1.x86_64 → opensearch-2.16.0-1.x86_64

Unfortunately yes, I created a new event with the same configuration, but the problem still exists.

@mako42
Thank you very much for mentioning that Issue, that sounds exactly like the problem that I am facing at the moment. I will keep an eye on that and hopefully it will be solved with the next update.

Thanks to you both for your help!

1 Like

For those encountering this, add the below option to your opensearch.yml and track the issue here.

search.max_aggregation_rewrite_filters: 0

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.