Events triggering count() == 0 even though "replay search" shows logs

1. Describe your incident:

I have Event Definitions setup with Aggregation using count() == 0, and these events are matching / being triggered even through there are log entries that match.

The filter preview on the right shows there are logs that match the filter - meaning that the count() == 0 should NOT match.

Even when I do a replay search on the alert, it shows entries.

Event definition:

Alerts being triggered:

Replay search shows this:

2. Describe your environment:

  • OS Information: Arch Linux, Podman containers
  • Graylog 6.1.4+7528370 on graylog (Eclipse Adoptium 17.0.13 on Linux 6.6.69-1-lts)

Containers:

3. What steps have you already taken to try and solve the problem?

  1. I’ve tried adding search.max_aggregation_rewrite_filters: "0" to my opensearch.yml configuration - in case this was related to this issue - even though datanode is using opensearch v2.15.
    I did this by adding this Environment variable to my Podman container configuration: opensearch.search.max_aggregation_rewrite_filters=0
    Unfortunately this didn’t help.

  2. I tried adding a “group by field” of “source” to the event definition.
    Unfortunately this didn’t help - the events still triggered with or without this.

4. How can the community help?

Is anyone else experiencing this problem?

Anything wrong with my event definition that is causing this?

I think I may have found the issue here - which I suspect is a bug.

I created a new event definition from scratch, with the same configuration… and it did NOT trigger. Curious…

I used the Graylog API to get the event definitions and noticed a difference in the field “config → series → 0 → field” which was now showing null instead of “” in the event that is triggering.

The issue appears to be if you have set a value in the “Select Field (Optional)” text box, but then cleared it again - it gets set to “” but if you have never set a value at all it is set to null.

Event that triggers incorrectly:

New event that doesn’t trigger (correct):

Field that clears to “” instead of null:

The reson the second worked and not the first is that in the second you did not have a group by.

You cannot do count=0 or count<1 when you have a group by. It makes sense because if there is nothing there there is nothing to group by and so the logic fails.

You can do it though if you a specific enough search and then just aggregate the whole amount of returned results.

Also the preview on the right has nothing to do with the aggregations, only the query itself.

Hi @Joel_Duffield,

Thanks for the reply and info.

I’ve done some testing - and found the following.

Starting with this event definition which works correctly:

{
  "_scope": "DEFAULT",
  "id": "6779f186970c084e8e0a8f38",
  "title": "No logs from homeassistant.woods.am",
  "description": "",
  "updated_at": "2025-01-05T13:38:27.783Z",
  "matched_at": "2025-01-05T09:51:26.801Z",
  "priority": 2,
  "alert": true,
  "config": {
    "type": "aggregation-v1",
    "query": "source:\"homeassistant.woods.am\"",
    "query_parameters": [],
    "filters": [],
    "streams": [
      "000000000000000000000001"
    ],
    "stream_categories": [],
    "group_by": [],
    "series": [
      {
        "type": "count",
        "id": "count-",
        "field": null
      }
    ],
    "conditions": {
      "expression": {
        "expr": "==",
        "left": {
          "expr": "number-ref",
          "ref": "count-"
        },
        "right": {
          "expr": "number",
          "value": 0
        }
      }
    },
    "search_within_ms": 300000,
    "execute_every_ms": 300000,
    "use_cron_scheduling": false,
    "cron_expression": null,
    "cron_timezone": null,
    "event_limit": 100
  },
  "field_spec": {
    "source": {
      "data_type": "string",
      "providers": [
        {
          "type": "template-v1",
          "template": "homeassistant.woods.am",
          "require_values": false
        }
      ]
    },
    "message": {
      "data_type": "string",
      "providers": [
        {
          "type": "template-v1",
          "template": "No logs received for >5 mins",
          "require_values": false
        }
      ]
    }
  },
  "key_spec": [],
  "notification_settings": {
    "grace_period_ms": 3600000,
    "backlog_size": 0
  },
  "notifications": [
    {
      "notification_id": "676f7274eccf2a0bc86dcd88",
      "notification_parameters": null
    }
  ],
  "storage": [
    {
      "type": "persist-to-streams-v1",
      "streams": [
        "000000000000000000000002"
      ]
    }
  ],
  "scheduler": null,
  "state": "ENABLED"
}

Adding Group by Field = source - This DIDN’T cause the event to spuriously trigger, but DID stop it working when it was supposed to (when logs had stop being received for more than 5 mins):

{
  "_scope": "DEFAULT",
  "id": "6779f186970c084e8e0a8f38",
  "title": "No logs from homeassistant.woods.am",
  "description": "",
  "updated_at": "2025-01-06T01:47:39.522Z",
  "matched_at": "2025-01-05T09:51:26.801Z",
  "priority": 2,
  "alert": true,
  "config": {
    "type": "aggregation-v1",
    "query": "source:\"homeassistant.woods.am\"",
    "query_parameters": [],
    "filters": [],
    "streams": [
      "000000000000000000000001"
    ],
    "stream_categories": [],
    "group_by": [
      "source"
    ],
    "series": [
      {
        "type": "count",
        "id": "count-",
        "field": null
      }
    ],
    "conditions": {
      "expression": {
        "expr": "==",
        "left": {
          "expr": "number-ref",
          "ref": "count-"
        },
        "right": {
          "expr": "number",
          "value": 0
        }
      }
    },
    "search_within_ms": 300000,
    "execute_every_ms": 300000,
    "use_cron_scheduling": false,
    "cron_expression": null,
    "cron_timezone": null,
    "event_limit": 100
  },
  "field_spec": {
    "source": {
      "data_type": "string",
      "providers": [
        {
          "type": "template-v1",
          "template": "homeassistant.woods.am",
          "require_values": false
        }
      ]
    },
    "message": {
      "data_type": "string",
      "providers": [
        {
          "type": "template-v1",
          "template": "No logs received for >5 mins",
          "require_values": false
        }
      ]
    }
  },
  "key_spec": [],
  "notification_settings": {
    "grace_period_ms": 3600000,
    "backlog_size": 0
  },
  "notifications": [
    {
      "notification_id": "676f7274eccf2a0bc86dcd88",
      "notification_parameters": null
    }
  ],
  "storage": [
    {
      "type": "persist-to-streams-v1",
      "streams": [
        "000000000000000000000002"
      ]
    }
  ],
  "scheduler": null,
  "state": "ENABLED"
}

Then I tried removing the group by field, and instead selecting “source” in the count() select field, then clearing it with the “X” icon, and saving the event definition. This changed the series “field” value from “” to null. This DID cause the event to start spuriously triggering, even though logs were being received, as could be seen on the replay search

{
  "_scope": "DEFAULT",
  "id": "6779f186970c084e8e0a8f38",
  "title": "No logs from homeassistant.woods.am",
  "description": "",
  "updated_at": "2025-01-06T02:00:24.800Z",
  "matched_at": "2025-01-05T09:51:26.801Z",
  "priority": 2,
  "alert": true,
  "config": {
    "type": "aggregation-v1",
    "query": "source:\"homeassistant.woods.am\"",
    "query_parameters": [],
    "filters": [],
    "streams": [
      "000000000000000000000001"
    ],
    "stream_categories": [],
    "group_by": [],
    "series": [
      {
        "type": "count",
        "id": "count-",
        "field": ""
      }
    ],
    "conditions": {
      "expression": {
        "expr": "==",
        "left": {
          "expr": "number-ref",
          "ref": "count-"
        },
        "right": {
          "expr": "number",
          "value": 0
        }
      }
    },
    "search_within_ms": 300000,
    "execute_every_ms": 300000,
    "use_cron_scheduling": false,
    "cron_expression": null,
    "cron_timezone": null,
    "event_limit": 100
  },
  "field_spec": {
    "source": {
      "data_type": "string",
      "providers": [
        {
          "type": "template-v1",
          "template": "homeassistant.woods.am",
          "require_values": false
        }
      ]
    },
    "message": {
      "data_type": "string",
      "providers": [
        {
          "type": "template-v1",
          "template": "No logs received for >5 mins",
          "require_values": false
        }
      ]
    }
  },
  "key_spec": [],
  "notification_settings": {
    "grace_period_ms": 3600000,
    "backlog_size": 0
  },
  "notifications": [
    {
      "notification_id": "676f7274eccf2a0bc86dcd88",
      "notification_parameters": null
    }
  ],
  "storage": [
    {
      "type": "persist-to-streams-v1",
      "streams": [
        "000000000000000000000002"
      ]
    }
  ],
  "scheduler": null,
  "state": "ENABLED"
}

I could fix this (stop the events from spuriously being triggered) by using curl to do a GET of the event definition, and then a PUT of the event definition with only the series field value changed from “” to null.

$ curl -u USERNAME:"PASSWORD" -H "Accept: application/json" -X GET "https://graylog.woods.am/api/events/definitions/6779f186970c084e8e0a8f38"

{"_scope":"DEFAULT","id":"6779f186970c084e8e0a8f38","title":"No logs from homeassistant.woods.am","description":"","updated_at":"2025-01-06T02:00:24.800Z","matched_at":"2025-01-05T09:51:26.801Z","priority":2,"alert":true,"config":{"type":"aggregation-v1","query":"source:\"homeassistant.woods.am\"","query_parameters":[],"filters":[],"streams":["000000000000000000000001"],"stream_categories":[],"group_by":[],"series":[{"type":"count","id":"count-","field":""}],"conditions":{"expression":{"expr":"==","left":{"expr":"number-ref","ref":"count-"},"right":{"expr":"number","value":0.0}}},"search_within_ms":300000,"execute_every_ms":300000,"use_cron_scheduling":false,"cron_expression":null,"cron_timezone":null,"event_limit":100},"field_spec":{"source":{"data_type":"string","providers":[{"type":"template-v1","template":"homeassistant.woods.am","require_values":false}]},"message":{"data_type":"string","providers":[{"type":"template-v1","template":"No logs received for >5 mins","require_values":false}]}},"key_spec":[],"notification_settings":{"grace_period_ms":3600000,"backlog_size":0},"notifications":[{"notification_id":"676f7274eccf2a0bc86dcd88","notification_parameters":null}],"storage":[{"type":"persist-to-streams-v1","streams":["000000000000000000000002"]}],"scheduler":null,"state":"ENABLED"}

$ curl -u USERNAME:"PASSWORD" -H "Content-Type: application/json" -H "X-Requested-By: cli" -X PUT -d '{"_scope":"DEFAULT","id":"6779f186970c084e8e0a8f38","title":"No logs from homeassistant.woods.am","description":"","updated_at":"2025-01-06T02:00:24.800Z","matched_at":"2025-01-06T02:02:34.993Z","priority":2,"alert":true,"config":{"type":"aggregation-v1","query":"source:\"homeassistant.woods.am\"","query_parameters":[],"filters":[],"streams":["000000000000000000000001"],"stream_categories":[],"group_by":[],"series":[{"type":"count","id":"count-","field":null}],"conditions":{"expression":{"expr":"==","left":{"expr":"number-ref","ref":"count-"},"right":{"expr":"number","value":0.0}}},"search_within_ms":300000,"execute_every_ms":300000,"use_cron_scheduling":false,"cron_expression":null,"cron_timezone":null,"event_limit":100},"field_spec":{"source":{"data_type":"string","providers":[{"type":"template-v1","template":"homeassistant.woods.am","require_values":false}]},"message":{"data_type":"string","providers":[{"type":"template-v1","template":"No logs received for >5 mins","require_values":false}]}},"key_spec":[],"notification_settings":{"grace_period_ms":3600000,"backlog_size":0},"notifications":[{"notification_id":"676f7274eccf2a0bc86dcd88","notification_parameters":null}],"storage":[{"type":"persist-to-streams-v1","streams":["000000000000000000000002"]}],"scheduler":null,"state":"ENABLED"}'

In summary - I think your point about not being able to use count()==0 or <1 when you have a group by field does PREVENT the event from triggering correctly… but there is also another problem that with the count() select field set to “” instead of null it causes the event to be spuriously triggered each time. This is caused by selecting a value in that field, and then clearing it again. Note that you DON’T have to first save the event definition with the select field populated - it is enough to populate it, immediately clear it and then save the event definition.

I’ve reported this as a GitHub Issue here: Clearing Event Definition Aggregation "Select Field" causes events to trigger spuriously · Issue #21278 · Graylog2/graylog2-server · GitHub

1 Like

Thanks for opening the github!