I have a cluster of graylog2 v2.5.1 with three nodes within.
After launch some time all works fine, but then one of nodes stucks and not processed any incoming messages, journal grows and “Process buffer” 100% utilized.
The only effective method that I managed to find to unstuck graylog - kill it and start again. But this happens few times per day, on random servers and i can not figure out why.
processes messages 24/7
- Graylog: 2.5.1
- Elasticsearch Version: 5.6.13
- MongoDB Version: 3.4.9
- Operating System: Ubuntu 16.04.5
What can I do to diagnose and fix this problem?
Servers has enough free memory, disk performance and cpu.
GRAYLOG_SERVER_JAVA_OPTS="-Djava.net.preferIPv4Stack=true -Xms8000m -Xmx8000m -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow"
Each server has 16G of RAM, it seems this should be enough. And graylog cluster recieve about 2k messages per second.