Hey guys!
I have a cluster of graylog2 v2.5.1 with three nodes within.
After launch some time all works fine, but then one of nodes stucks and not processed any incoming messages, journal grows and “Process buffer” 100% utilized.
The only effective method that I managed to find to unstuck graylog - kill it and start again. But this happens few times per day, on random servers and i can not figure out why.
Expected Behavior
processes messages 24/7
Your Environment
- Graylog: 2.5.1
- Elasticsearch Version: 5.6.13
- MongoDB Version: 3.4.9
- Operating System: Ubuntu 16.04.5
What can I do to diagnose and fix this problem?
Servers has enough free memory, disk performance and cpu.
launch options:
GRAYLOG_SERVER_JAVA_OPTS="-Djava.net.preferIPv4Stack=true -Xms8000m -Xmx8000m -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow"
Each server has 16G of RAM, it seems this should be enough. And graylog cluster recieve about 2k messages per second.