Journal not Getting processed after ES failure

I had an ES failure during the weekend (ran out of space) which I resolved today.

Once I fixed ES graylog started ingesting the unprocessed logs from my other 30 graylog servers.

However /var was filled up and graylog stopped processing logs.

I then stopped graylog and moved the journal to a dedicated partion and rsynced all the files from /var/lib/graylog/journal to the new location.

I then started both nodes in the graylog cluster but now I have 5 million unprocessed messages.

I found the following post: After disk space issue - no out messages - help

I really don;t want to delete the journal because I’m assuming I will lose all the messages currently being stored within the journal. Is that assumption false?

I also noticed that the journal on node2 is not being populated like it is on node1.

The assumption is correct. The journal contains all messages which have been received by Graylog but which haven’t been processed and written to the outputs (i. e. Elasticsearch) yet.

Unfortunately I’m afraid you’ll have to delete the journal files as they have likely been corrupted when the system ran out of disk space. :frowning:

Thanks for the quick reply (as always :smiley:slight_smile:)

Deleting the journal did the trick but I lost 5GB worth of data.

Do you have any recommendations for preventing journal corruption (other than not running out of disk space)?

Luckily I don’t have this system in production yet, but will will be using it for NIST compliance so it’s super important that I don’t lose data.

That’s currently the only good advice I have as it seems to be the number one reason for a corrupted journal.

If you want, you can chime in on the discussion to the following pull request:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.