shines2
(Shaun Hines)
October 23, 2017, 7:23pm
1
I had an ES failure during the weekend (ran out of space) which I resolved today.
Once I fixed ES graylog started ingesting the unprocessed logs from my other 30 graylog servers.
However /var was filled up and graylog stopped processing logs.
I then stopped graylog and moved the journal to a dedicated partion and rsynced all the files from /var/lib/graylog/journal to the new location.
I then started both nodes in the graylog cluster but now I have 5 million unprocessed messages.
I found the following post: After disk space issue - no out messages - help
I really don;t want to delete the journal because I’m assuming I will lose all the messages currently being stored within the journal. Is that assumption false?
I also noticed that the journal on node2 is not being populated like it is on node1.
jochen
(Jochen)
October 23, 2017, 7:30pm
2
The assumption is correct. The journal contains all messages which have been received by Graylog but which haven’t been processed and written to the outputs (i. e. Elasticsearch) yet.
Unfortunately I’m afraid you’ll have to delete the journal files as they have likely been corrupted when the system ran out of disk space.
shines2
(Shaun Hines)
October 23, 2017, 7:42pm
3
Thanks for the quick reply (as always slight_smile:)
Deleting the journal did the trick but I lost 5GB worth of data.
Do you have any recommendations for preventing journal corruption (other than not running out of disk space)?
Luckily I don’t have this system in production yet, but will will be using it for NIST compliance so it’s super important that I don’t lose data.
jochen
(Jochen)
October 23, 2017, 8:34pm
4
That’s currently the only good advice I have as it seems to be the number one reason for a corrupted journal.
If you want, you can chime in on the discussion to the following pull request:
Graylog2:master
← Graylog2:journal-disk-check-periodical
opened 02:00PM - 11 Oct 17 UTC
Users regularly run into problems with their Graylog setups due to the disk jour… nal filling up the whole disk and thus corrupting the journal.
This change set adds a periodical which checks the available disk space in the journal directory and if it's insufficient, creates an urgent system notification and switches the load-balancer status of the Graylog node to DEAD.
system
(system)
Closed
November 6, 2017, 8:35pm
5
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.