Graylog Instance Broken By Hard Shutdown

I have a Graylog 3.0.2 installation that came from the Graylog Yum repository. It is currently backed by an Elasticsearch 6.8.0 that was installed from the elasticsearch-6.x repositories and mongodb 3.2.22-1.el7 that came from the mongodb-org yum repository. It is in a CentOS7-current, Hyper-V VM that has been assigned 4 cores and 16GB of RAM. The server has 8 cores and 32GB RAM and is effectively otherwise unused (1 core and 2GB are used by a small librenms installation).

Everything was working three days ago. It was ingesting about 1.2M log lines per day which represents about 1.5GB in the syslog files. All syslog clients forward their logs to an rsyslog server on the graylog server, and the rsyslog forwards the logs to graylog through a TCP connection.

Due to some idiocy on my part, this VM got taken down hard, and ever since then it has not been working.

The immediate symptom is that no log records are showing up in the “last five minutes” search even if the criteria is left blank.

Closer examination shows that the graylog java process is railing the CPUs (between 385% and 400%). I have also discovered a large number of messages listed as present in the graylog disk journal.

The current state of the system is that it is running, consuming > 385% of cpu through the graylog java process. It is showing between 5 and 25 msg/s coming in, but 0 going out. It does not appear to be inserting any of the records into elasticsearch in such a way that they get returned by a search.

I have tried:

  • confirming that I did not have a disk-full incident (I have not)
  • rebooting
  • confirming my elasticsearch cluster status is green (it is, and graylog agrees it is)
  • recalculating the index ranges for the default (only) index
  • ensuring the OS is up to date (this actually performed an upgrade from elasticsearch 6.6.x to 6.8.0)
  • deleting all indicies in the elasticsearch (with graylog stopped)
  • cranking the logging up to “debug” - but nothing new seems to appear in the server.log file
  • turning off rsyslog to give graylog the chance to catch up
  • restarting mongod

My google-fu is weak and I cannot find any posts which appear to describe my issue. I’m sure it is something straight forward that I’m just missing.

Can anyone share ideas as to what I should look at next?

Thank you.

Okay for what its worth – I ended up nuking the mongodb instance and all databases, reinstalling all the mongo pieces, and then rebooting. Graylog spent an hour chewing through a ton of backlogged messages from somewhere (probably rsyslog) and is now behaving somewhat predictably.

you need at least Elasticsearch 3.6 for the current Graylog … to make everything work nice. As we point out in our documentation…

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.