Process buffer 100% and nothing helps until restart

Hey guys!
I have a cluster of graylog2 v2.5.1 with three nodes within.
After launch some time all works fine, but then one of nodes stucks and not processed any incoming messages, journal grows and “Process buffer” 100% utilized.
The only effective method that I managed to find to unstuck graylog - kill it and start again. But this happens few times per day, on random servers and i can not figure out why.

Expected Behavior

processes messages 24/7 :slight_smile:

Your Environment

  • Graylog: 2.5.1
  • Elasticsearch Version: 5.6.13
  • MongoDB Version: 3.4.9
  • Operating System: Ubuntu 16.04.5

What can I do to diagnose and fix this problem?
Servers has enough free memory, disk performance and cpu.

launch options:
GRAYLOG_SERVER_JAVA_OPTS=" -Xms8000m -Xmx8000m -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow"
Each server has 16G of RAM, it seems this should be enough. And graylog cluster recieve about 2k messages per second.

check your logs.
usually the graylog loose the connection with elasticsearch.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.