Graylog 3.3.15 lags behing 4hours consistently

Hi all,

We have old Graylog 3.3.15 setup running on AWS ECS/EC2 with AWS OpenSearch(ES 5.6) cluster behind it (yes, I know, very old versions, we plan to update it this year, but I’d like to fix the current issue).

About 2 weeks ago ES cluster went into read-only mode, because it run out of storage and before we noticed and fixed it, it was in state green, but not accepting any new messages. After space was freed, it started working again, but it caught up to about current time -4h and stays there since more than a week.
Notes:

  • There was no change in how messages are being sent, timestamps look correct
  • I’ve recalculated index ranges - it changes nothing
  • CPU/mem on containers and ES cluster seem to be pretty low, so it should be able to catch up to current time
  • Graylog is showing 2 messages constantly: Uncommited messages deleted from journal and Journal utilization is too high - I have a suspicion that it is going through them at the same rate that the new messages are being saved. I’m ok with emptying the journal, even if it means losing those messages.

Any pointers/help would be appreciated.

Are there Messages in your buffers? Click on the node to see those buffers.

A constant gap of -4h sounds a bit like a timestamp-issue to me.

I run Elastic/Opensearch in AWS on self provisioned EC2-ARM-Nodes, it’s a bit cheaper than the “original” instances by AWS. So far it works like a charm.

1 Like

Look at System/Nodes>Details and look at each of the buffers. Are any of them full? Paste a screenshot if you can.

1 Like