Graylog keeps sending same message in backlog

1. Describe your incident:

We have set up a stream and an alert that is triggered whenever a message is added to this stream.

The event definition is set to search within 5 minutes and run the search every 5 minutes (essentially looking back 5 minutes every 5 minutes).

The grace period is set to 5 minutes and the notifications are set to include 5 messages.

The problem is that every single alert notification that we receive for the past few days includes the same message from a couple of days ago. Not 5, but only a single message and always the same despite there having been hundreds of other messages that matched the stream and/or event definition.

|Timestamp|2024-06-22T23:14:32.459+02:00|
|Timestamp Processing|2024-06-22T23:14:32.459+02:00|
{index=graylog_4, message={"name":"redacted","hostname":"yavin","pid":2271968,"level":50,"msg":"undefined","time":"2024-06-19T10:38:06.329Z", …}

2. Describe your environment:

  • OS Information: Debian 11.9
  • Package Version: 5.2.8-1

3. What steps have you already taken to try and solve the problem?

Reviewed Filter & Aggregation settings on the Event Definition.

4. How can the community help?

What could I have configured wrong? Why do I keep getting the same message from the backlog?
Could it be because my Event Definition Filter has an empty Search Query (Using a pre-filtered Stream) ? But even then, why keep including the same old message in the backlog section when there are literally hundreds of more recent ones?

The message doesn’t look unusual. Can you share the event definition and stream rules?

So you are saying, the alert fires correctly, just the message backlog is wrong?

I’m trying to investigate further, and it looks like it might be related to filebeat somehow. I have 10 occurrences of the pattern I’m looking for in the logs on the remote host, all around 10:50-ish on June 24th.

And today, on June 25th at 13:27, filebeat seems to be shipping the same log lines to be ingested again.

This would mean that the processing happens correctly and so does the alerting. They both react to newly ingested log lines, immediately.

And following this up, I can indeed see that filebeat is shipping the same lines over and over again. Sometimes at 1h30 intervals, sometimes 3 hours…

Filebeat is also rotating its logs very frequently… so often that this looks off:

root@yavin:~# ls -l /var/log/filebeat/
total 1608
-rw------- 1 root root 372999 Jun 25 07:22 filebeat-20240625-1.ndjson
-rw------- 1 root root   6456 Jun 25 07:26 filebeat-20240625-2.ndjson
-rw------- 1 root root 378442 Jun 25 09:22 filebeat-20240625-3.ndjson
-rw------- 1 root root   6456 Jun 25 09:26 filebeat-20240625-4.ndjson
-rw------- 1 root root 737211 Jun 25 13:22 filebeat-20240625-5.ndjson
-rw------- 1 root root   6456 Jun 25 13:27 filebeat-20240625-6.ndjson
-rw------- 1 root root  98332 Jun 25 13:55 filebeat-20240625-7.ndjson
-rw------- 1 root root   6456 Jun 25 05:26 filebeat-20240625.ndjson

Looking at the sidecar logs on this host, we can see that it does keep stopping and starting the filebeat agent:

root@yavin:~# tail -f /var/log/graylog-sidecar/sidecar.log
time="2024-06-25T09:22:03+02:00" level=info msg="[filebeat-standalone] Stopping"
time="2024-06-25T09:26:38+02:00" level=info msg="Adding process runner for: filebeat-standalone"
time="2024-06-25T09:26:38+02:00" level=info msg="[filebeat-standalone] Configuration change detected, rewriting configuration file."
time="2024-06-25T09:26:38+02:00" level=info msg="[filebeat-standalone] Starting (exec driver)"
time="2024-06-25T13:22:11+02:00" level=info msg="Removing process runner: filebeat-standalone"
time="2024-06-25T13:22:11+02:00" level=info msg="No configurations assigned to this instance. Skipping configuration request."
time="2024-06-25T13:22:11+02:00" level=info msg="[filebeat-standalone] Stopping"
time="2024-06-25T13:27:05+02:00" level=info msg="Adding process runner for: filebeat-standalone"
time="2024-06-25T13:27:05+02:00" level=info msg="[filebeat-standalone] Configuration change detected, rewriting configuration file."
time="2024-06-25T13:27:06+02:00" level=info msg="[filebeat-standalone] Starting (exec driver)"

And in between these restarts, filebeat does not seem to remember where it left off in reading the logs.

Good to know the core functionality is behaving as expected.
Now you just need to find out what is causing sidecar to detect config changes so frequently :slight_smile:

Yeah, especially since, despite what the logs claim,

  1. a configuration is assigned and
  2. it hasn’t changed in weeks.

We’ve also seen the logs being sent again without the sidecar detecting any configuration changes or restarting filebeat.
Having a hard time to detect the reason these logs are re-sent, assuming the tagging (e.g. filebeat timestamp) is set at the source.

Hi,
which version of filebeat are you running?
We recently updated from 8.13.4 to 8.14.0 and just noticed the same behavior. Every message is again being sent by filebeat every hour and indexed as a new message in graylog.

Unfortunately, I cannot tell what solved the issue for us exactly, but I can highlight some potential problem sources.

We had filebeat installed separately and a sidecar configuration defined in graylog that was supposed to be using the standalone version from /usr/bin/ rather than the packaged version from /usr/lib/graylog-sidecar. (Which is 8.9.0 in sidecar 1.5.0-1 btw).

The standalone versions we were running before were:

Start-Date: 2024-06-17  18:19:30
Commandline: apt upgrade
Upgrade: filebeat:amd64 (8.13.4, 8.14.1)
End-Date: 2024-06-17  18:19:33

Start-Date: 2024-06-25  18:35:22
Commandline: apt remove filebeat
Remove: filebeat:amd64 (8.14.1)
End-Date: 2024-06-25  18:35:22

I don’t know for certain, but it’s quite possible that the issues were were seing started only from June 17th on…

I was considering opening a thread with the ELK people, but finally didn’t when reverting to the pre-packaged sidecar/filebeat combo seemed to have solved the issue.

Note: What prompted me to create the standalone config and install filebeat separately was probably that the location where the filebeat binary lives in the Debian package is not whitelisted in the collector_binaries_accesslist section of the sidecar.yml config file distributed with the same .deb. Not even suggested here.

Thank you very much, so also on our side a downgrade of the filebeat package was the way to go to fix this issue.
We will also switch to the bundled filebeat and auditbeat from the graylog-sidecar package in the future, this will make things a lot easier.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.