Graylog Instance Broken By Hard Shutdown

xdroop · May 29, 2019, 7:03pm

I have a Graylog 3.0.2 installation that came from the Graylog Yum repository. It is currently backed by an Elasticsearch 6.8.0 that was installed from the elasticsearch-6.x repositories and mongodb 3.2.22-1.el7 that came from the mongodb-org yum repository. It is in a CentOS7-current, Hyper-V VM that has been assigned 4 cores and 16GB of RAM. The server has 8 cores and 32GB RAM and is effectively otherwise unused (1 core and 2GB are used by a small librenms installation).

Everything was working three days ago. It was ingesting about 1.2M log lines per day which represents about 1.5GB in the syslog files. All syslog clients forward their logs to an rsyslog server on the graylog server, and the rsyslog forwards the logs to graylog through a TCP connection.

Due to some idiocy on my part, this VM got taken down hard, and ever since then it has not been working.

The immediate symptom is that no log records are showing up in the “last five minutes” search even if the criteria is left blank.

Closer examination shows that the graylog java process is railing the CPUs (between 385% and 400%). I have also discovered a large number of messages listed as present in the graylog disk journal.

The current state of the system is that it is running, consuming > 385% of cpu through the graylog java process. It is showing between 5 and 25 msg/s coming in, but 0 going out. It does not appear to be inserting any of the records into elasticsearch in such a way that they get returned by a search.

I have tried:

confirming that I did not have a disk-full incident (I have not)
rebooting
confirming my elasticsearch cluster status is green (it is, and graylog agrees it is)
recalculating the index ranges for the default (only) index
ensuring the OS is up to date (this actually performed an upgrade from elasticsearch 6.6.x to 6.8.0)
deleting all indicies in the elasticsearch (with graylog stopped)
cranking the logging up to “debug” - but nothing new seems to appear in the server.log file
turning off rsyslog to give graylog the chance to catch up
restarting mongod

My google-fu is weak and I cannot find any posts which appear to describe my issue. I’m sure it is something straight forward that I’m just missing.

Can anyone share ideas as to what I should look at next?

Thank you.

xdroop · May 30, 2019, 4:44pm

Okay for what its worth – I ended up nuking the mongodb instance and all databases, reinstalling all the mongo pieces, and then rebooting. Graylog spent an hour chewing through a ton of backlogged messages from somewhere (probably rsyslog) and is now behaving somewhat predictably.

jan · June 3, 2019, 2:14pm

you need at least Elasticsearch 3.6 for the current Graylog … to make everything work nice. As we point out in our documentation…

system · June 17, 2019, 2:14pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog Processing Messages Super Slow Graylog Central (peer support)	3	4005	October 16, 2018
Graylog 4.1 on RedHat 8 craches daily Graylog Central (peer support)	27	1521	February 3, 2021
Graylog, log problem Graylog Central (peer support)	23	2127	March 18, 2019
Graylog has Millions of Unprocessed Messages Graylog Central (peer support)	2	974	February 10, 2021
Journal utilization is too high again Graylog Central (peer support)	7	6076	June 15, 2018

Graylog Instance Broken By Hard Shutdown

Related topics