Heavy backlog on more than one nodes. some nodes having more than 2M backlog and some having more than 10M. When I click on nodes I found process and output buffer is 100% utilized. Can we do any modifications to solve this? Already read the thread regarding this issue but not able to understand.
Go to system/outputs and check. If not you might want to look at the batch_size
And the buffer workers
The total number of processors should not be more than the available CPUs and the batch size should be like the median number of messages you have in the flush_period.
Graylog is processing the all outputs in sequence and the last is the output to elasticsearch.
If you have outputs configured and they are not very responsive - means slow this will slow down your processing.
If possible disable all outputs and see if this helps to speed your system up.
But after restart node, I found something in the log.
2018-10-02T20:36:21.580Z WARN [NettyTransport] receiveBufferSize (SO_RCVBUF) for input BeatsInput{title=Beats - DmsBatchJobs ( Extra Syncer ), type=org.graylog.plugins.beats.BeatsInput, nodeId=hj3c02f5-9878-45b0-a788-8bfe9c9223d2} should be 2097152 but is 212992.
2018-10-02T20:36:21.580Z WARN [NettyTransport] receiveBufferSize (SO_RCVBUF) for input BeatsInput{title=Beats (Tc's & docker01-04 ), type=org.graylog.plugins.beats.BeatsInput, nodeId=null} should be 2097152 but is 212992.
2018-10-02T20:36:21.580Z WARN [NettyTransport] receiveBufferSize (SO_RCVBUF) for input GELFTCPInput{title=Gelf Tcp (Dms01, Dms02), type=org.graylog2.inputs.gelf.tcp.GELFTCPInput, nodeId=hj3c02f5-9878-45b0-a788-8bfe9c9223d2} should be 1048576 but is 212992.
that shows that the messages are oversized for the configured receive buffers. So you need to raise the input size of this named inputs to get the bigger messages.
Taking a guess here, but considering that all three of those buffers are capped at 212992, it’s likely a system-level setting capping TCP buffer sizes to that limit.
If that is the case, the way to change that limit is going to vary based on your OS, the service manager you’re using to run Graylog, etc.