Performance Decrease when Adding Stream Output

I’ve recently set up a production Graylog server after running a PoC test with the pre-built OVA. I have devices pointing to it already, receiving somewhere between 1000-2000 messages per second on average.

Everything works fine until I add an output to our All Messages stream, to forward the logs to our SIEM. The output performance will drop from processing messages as received to maybe 10-50 messages per second. If I remove the output and reboot the graylog service it will jump up to process upwards of 20,000 messages per second to clear everything that was backed up. If I disable our loudest Input, reducing the log intake to 10-200 messages per second, the same issue occurs and it will drop to processing 5-20 messages per second. At no point are system resources heavily utilized. The OVA did not have this issue when testing the All Messages stream output.

I’m not sure what needs to be reviewed or where to figure out why adding the output reduces the processing performance on this server. Any help would be appreciated.

you should check the other system that received the messages - because adding the output is adding another step to the processing and the messages count as successfully stored (and processed) if the output was send our successful.

As long as the output is not UDP, that means the other party has to acknowledge the message …

This seems to be any output we enter. I’ve had to modify settings to get the input working correctly so I assume there’s some output settings I need too - I just can’t seem to find anything regarding it’s setup.

how did you output? What kind of protocol and what output?

It’s a custom output provided by the receiving appliance’s vendor that was made for Graylog. This is the exact one if it helps - https://github.com/sagarinpursue/graylog-http-plugin

This worked with the pre-built OVA. I set up a test GELF output that experienced the same performance issues. If I delete the output performance does not return to normal until I restart services.

I’ve adjusted these settings so far, if it helps:
/etc/default/elasticsearch
Additional Java OPTS
ES_HEAP_SIZE=40g
ES_JAVA_OPTS="-Xms10g -Xmx10g"

/etc/default/graylog-server
GRAYLOG_SERVER_JAVA_OPTS="-Xms8g -Xmx8g

/etc/graylog/server/server.conf
output_batch_size = 5000
processbuffer_processors = 5
outputbuffer_processors = 5

having elasticsearch with more than 32GB HEAP configured will not help you …

that should by max be 32GB or 50% of the available system RAM.

Did you only have the problems if you output? If yes - check the host that is receiving the output.

I have 2 graylog servers, the prebuilt OVA provided by Graylog we used as a test that is still alive, and a fresh-built box with Graylog installed that is meant to be our production server. The prebuilt OVA does not have these problems when applying the output, just the new installation. That would imply that the receiving server does not have problems receiving the logs.

I’ll adjust the heap back down, 40G was posted somewhere in the forum for similar issues that I was experiencing so that was what I used.