Journal not Getting processed after ES failure

shines2 · October 23, 2017, 7:23pm

I had an ES failure during the weekend (ran out of space) which I resolved today.

Once I fixed ES graylog started ingesting the unprocessed logs from my other 30 graylog servers.

However /var was filled up and graylog stopped processing logs.

I then stopped graylog and moved the journal to a dedicated partion and rsynced all the files from /var/lib/graylog/journal to the new location.

I then started both nodes in the graylog cluster but now I have 5 million unprocessed messages.

I found the following post: After disk space issue - no out messages - help

I really don;t want to delete the journal because I’m assuming I will lose all the messages currently being stored within the journal. Is that assumption false?

I also noticed that the journal on node2 is not being populated like it is on node1.

jochen · October 23, 2017, 7:30pm

The assumption is correct. The journal contains all messages which have been received by Graylog but which haven’t been processed and written to the outputs (i. e. Elasticsearch) yet.

Unfortunately I’m afraid you’ll have to delete the journal files as they have likely been corrupted when the system ran out of disk space.

shines2 · October 23, 2017, 7:42pm

Thanks for the quick reply (as always slight_smile:)

Deleting the journal did the trick but I lost 5GB worth of data.

Do you have any recommendations for preventing journal corruption (other than not running out of disk space)?

Luckily I don’t have this system in production yet, but will will be using it for NIST compliance so it’s super important that I don’t lose data.

jochen · October 23, 2017, 8:34pm

That’s currently the only good advice I have as it seems to be the number one reason for a corrupted journal.

If you want, you can chime in on the discussion to the following pull request:

github.com/Graylog2/graylog2-server

Add periodical to check free disk space in journal directory

Graylog2:master ← Graylog2:journal-disk-check-periodical

opened 02:00PM - 11 Oct 17 UTC

joschi

+463 -8

Users regularly run into problems with their Graylog setups due to the disk jour…nal filling up the whole disk and thus corrupting the journal. This change set adds a periodical which checks the available disk space in the journal directory and if it's insufficient, creates an urgent system notification and switches the load-balancer status of the Graylog node to DEAD.

system · November 6, 2017, 8:35pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Prevent data lost after crash Graylog Central (peer support)	5	702	March 12, 2021
Journal Filling but Graylog not Processing Graylog Central (peer support)	8	4508	July 14, 2020
Fix Unprocessed Messages without deleting the journal Graylog Central (peer support)	7	9109	November 17, 2017
Unprocessed Messages in Journal Graylog Central (peer support)	6	3490	August 24, 2017
Journal Problems Graylog Central (peer support) sidecar , winlogbeat	13	6527	December 5, 2017

Journal not Getting processed after ES failure

Related topics