Journal filling with buffers remaining empty

Description of your problem

TL;DR
The journal is filling, but the process and output buffers are empty, and the output message count has not risen above 0 since I’ve noticed the issue.

A little background context…

I was troubleshooting an issue with our logging setup earlier today. I ended up finding that the issue was with the source, and the fix caused a significant amount of messages to flood our one (standalone) Graylog instance. It overfilled the journal while the 6 process buffers and 1 output buffer were maxed out. After addressing that, the buffers remained full, but the journal usage started going down, which I took as a good sign. I was watching that for about 10 minutes; it got through about 1 million messages and just over 50% of the maxed-out journal. Seemingly at a random point, the process and output buffers emptied and the journal usage and counts started slowly going up as messages came in.

Description of steps you’ve taken to attempt to solve the issue

I’ve tried restarting all three services on the host (mongo, elasticsearch, graylog), I’ve tried restarting individual services, I’ve tried changing the process buffers to 4 and output to 3, and I’ve tried restarting the whole machine, but I’m getting the same results. Nothing helpful seems to be in the log file (I’ve only looked back as far as the last service restart, though)

Environmental information

Operating system information

Rocky Linux 8.4 (was CentOS 8 when originally built)

Package versions

  • Graylog: 4.1.5
  • MongoDB: 4.2.17
  • Elasticsearch: 7.10.2

I haven’t included logs or configs because I have no idea what would be applicable/useful and I thought including everything would be excessive; I didn’t change anything in Graylog, ES, or MongoDB before the problem came up; and the only logs in the Graylog file seem to be related to starting up. If there’s something specific that would be helpful, please feel free to ask.

Per a coworker’s recommendation and searching the forum for the idea; I stopped the three services, deleted all journal files (rm -rf /var/lib/graylog-server/journal/*), and then started the services again. It seems to be working now. Kind of a bummer to loose those ~4 million messages that were in the journal, but I’m assuming there wasn’t much I could do with a presumably corrupt journal. If anyone has ideas on why there were no errors about the journal, that might be helpful for the future.

1 Like

Unfortunately I had that happen before. The only thing I have done different was increase the journal size & Volume to at least give me 24 hour head start on fixing problems before that happens again.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.