Yes - I read the docs thoroughly before posting :). I have simplified the initial scenario to highlight the problem now described below using Alert Definitions 1 and 2:
Streams
Prod Env Stream
-
namespace_name must match exactly production
- Remove matches from ‘All messages’ stream
Test Env Stream
-
namespace_name must match exactly test
- Remove matches from ‘All messages’ stream
Dev Env Stream
-
namespace_name must match exactly development
- Remove matches from ‘All messages’ stream
Alert Definition 1
Filter
- Search Query: _l:“Errror”
- Streams: Prod Env Stream
Aggregation: container_name
Definition: if count(_l) > 0
Using vanilla email notification, the emails are generated with the correct grouping (i.e. one email per container name) and the message backlog for each email is for that container_name only.
This is expected behavior, as per the Aggregation example given on the Filter and Aggregation UI page:
Select Fields that Graylog should use to group Filter results when they have identical values. Example:
Assuming you created a Filter with all failed log-in attempts in your network, Graylog could alert you when there are more than 5 failed log-in attempts overall. Now, add username
as Group by Field and Graylog will alert you for each username
with more than 5 failed log-in attempts.
Alert Definition 2 (identical to above, with the only change is grouping by 2 fields):
Aggregation: namespace_name and container_name
Note - I realize there is no point grouping by namespace_name when the filter is for Prod Env Stream only, this just makes the problem really obvious.
The emails are generated with the correct grouping. In particular, the following event fields are correct and can be verified by executing a search:
Message: ${event.message}
Source: ${event.source}
Key: ${event.key}
However, the message backlog is no longer grouped according to aggregation fields, and instead returns all messages. In other words, an email generated for Namespace 1 + Container A has event fields relevant to Namespace 1 + Container A only, but the message backlog includes messages for Namespace 1 + Container A, Namespace 1 + Container B… Namespace 1 + Container n.
The documentation says the Backlog is The list of messages or events which lead to this alert being generated. So I’m wondering why the ${event. } properties do not match the message backlog?