No more Output. Journal issue

Hi,
In my case, a cluster of 3 graylogs stopped sending messages to ES.
Loads of messages coming in, filling up the journal, but nothing going out to ES.
ES state GREEN, loads of disk space, no visible errors in graylog or ES logs, not an SElinux issue.

This happened after rsyslog shipped about 10G of logs in a couple hours. Despite going progressively, graylog died a couple times. Can’t remember well, and now investigating logs will be painful, but basically, journal got filled after massive import (millions of messages), and started to get flushed to ES, but as it approached the 200.000 messages left in journal, it died. At that point restarting graylog would put the node back in the cluster and running. (but messages left in journal were not sent even after 10 minutes)

Then I fed the cluster a few more gigs to finish my import, and all nodes stopped sending messages to ES.
I could see INs on the inputs, but nothing on the streams (their only rule is to match inputs number). Journals are growing fast…
I applied the ‘remove journal’ technique, which worked, but the millions of accumulated messages in those journals are unprocessed.

Is there a proper way to have graylog read those journals and index them into ES ?

I havn’t tried to stop graylog && copy the journals back to /var/lib/graylog/journal/ && start graylog, as I don’t want to make things worst.

THanks in advance !

he @Samira

the main problem here is that you need to identify where the problems are coming from. What does Elasticsearch write into its logs? High/Low watermark errors?
What does Graylog do exactly? processing buffer full, output buffer too?

In such a case you need to check all wheels in the system to identify where it is not running smooth.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.