I have a production cluster of GrayLog, moving an average of 2 million messages per day. It has been working like a charm, but sometimes, when a message peak arrives, the servers behave weird, they start to hold messages in the Journal, and they take a lot of time to flush that journal.
Is there any white paper, guide or documentation for tuning the servers? server.conf file has a lot of tuning parameters, but most of them are like blackboxes for me.
I’ve Googled a lot trying to find docs about tuning , but GrayLog is kind of new.
I appreciate any help with this.