Graylog Processing Messages Super Slow

We have a Graylog server (2.4.6-1) running on top of Elasticsearch (5.6.10) and MongoDB (3.4.10) on an Ubuntu 16.04 (kernel 4.4.0-135-generic) EC2 instance. The Graylog server is processing messages incredibly slowly. In the time it’s taken me to write this, another ~5000 messages have piled up. It’s not completely stopped, because if I turn off the input for a while, the number of unprocessed messages declines, but at maybe ~100 messages/minute, far too slow to keep up with the logs being sent to it.

We’re consistently sitting at ~1GB of RAM free, and usually at 95%+ CPU idle. None of our disks are over 60% usage. We had issues with system performance and resources in the past, so that was the first thing I checked.

The node health is all green, and the JVM has ~3.5 GB allocated to it. Graylog tells me the node is using ~1GB of that.

This issue has persisted through a full apt-purge of graylog-server, removing the old journal files, and several restarts of both individual services and the machine as a whole.

Our input is a syslog UDP, and turning off the stream filtering regex tests doesn’t help.

The graylog log file is here. The only problems in it are plugins we don’t use, far as I can tell. The elasticsearch log file is here.

@doctor_mustard

what resources are available on the instance? Do you have one instance handle everything or did you have them separated?

Most times the elasticsearch can’t handle the load with the given resources and you need to give it more and change some settings.

We would need to know, how much CPUs/RAM is available and if you tune the settings for the JVM for Graylog or Elasticsearch.

Hi Jan,
I posted this on /r/sysadmin after I learned I would have to wait so long for the account to be approved.
A nice guy helped me fix it.
If anyone finds this thread later and wants to know what worked, here’s the reddit thread that we troubleshooted it through.

