We have a Graylog server (2.4.6-1) running on top of Elasticsearch (5.6.10) and MongoDB (3.4.10) on an Ubuntu 16.04 (kernel 4.4.0-135-generic) EC2 instance. The Graylog server is processing messages incredibly slowly. In the time it’s taken me to write this, another ~5000 messages have piled up. It’s not completely stopped, because if I turn off the input for a while, the number of unprocessed messages declines, but at maybe ~100 messages/minute, far too slow to keep up with the logs being sent to it.
We’re consistently sitting at ~1GB of RAM free, and usually at 95%+ CPU idle. None of our disks are over 60% usage. We had issues with system performance and resources in the past, so that was the first thing I checked.
The node health is all green, and the JVM has ~3.5 GB allocated to it. Graylog tells me the node is using ~1GB of that.
This issue has persisted through a full apt-purge of graylog-server, removing the old journal files, and several restarts of both individual services and the machine as a whole.
Our input is a syslog UDP, and turning off the stream filtering regex tests doesn’t help.
The graylog log file is here. The only problems in it are plugins we don’t use, far as I can tell. The elasticsearch log file is here.