Buffer utilization is 100% for all nodes having backlog

Tafsir_Alam · October 2, 2018, 5:40pm

Hi All,

Heavy backlog on more than one nodes. some nodes having more than 2M backlog and some having more than 10M. When I click on nodes I found process and output buffer is 100% utilized. Can we do any modifications to solve this? Already read the thread regarding this issue but not able to understand.

Graylog version - 2.4.6
Elasticsearch - 5.6

14%20PM

jan · October 2, 2018, 5:41pm

the simple answer:

check your outputs (if you have any) and give elasticsearch more resources.

Tafsir_Alam · October 2, 2018, 5:42pm

@jan

Already having high resources.

Tafsir_Alam · October 2, 2018, 5:46pm

@jan which outputs…?

Tafsir_Alam · October 2, 2018, 6:14pm

@jan, How can we configure buffer…?

jan · October 2, 2018, 6:26pm

the question is - do you have some configured?

Go to system/outputs and check. If not you might want to look at the batch_size

github.com

Graylog2/graylog2-server/blob/2.4/misc/graylog.conf#L349-L354


# Batch size for the Elasticsearch output. This is the maximum (!) number of messages the Elasticsearch output
# module will get at once and write to Elasticsearch in a batch call. If the configured batch size has not been
# reached within output_flush_interval seconds, everything that is available will be flushed at once. Remember
# that every outputbuffer processor manages its own batch and performs its own batch write calls.
# ("outputbuffer_processors" variable)
output_batch_size = 500

And the buffer workers

github.com

Graylog2/graylog2-server/blob/2.4/misc/graylog.conf#L367-L370


# The number of parallel running processors.
# Raise this number if your buffers are filling up.
processbuffer_processors = 5
outputbuffer_processors = 3

The total number of processors should not be more than the available CPUs and the batch size should be like the median number of messages you have in the flush_period.

Tafsir_Alam · October 2, 2018, 6:33pm

@jan Thanks for the reply.

In system/outputs I found some of the entries but what I have to do?

jan · October 2, 2018, 6:47pm

Graylog is processing the all outputs in sequence and the last is the output to elasticsearch.

If you have outputs configured and they are not very responsive - means slow this will slow down your processing.
If possible disable all outputs and see if this helps to speed your system up.

Tafsir_Alam · October 2, 2018, 7:07pm

@jan
sorry but not having disable button.

jan · October 2, 2018, 7:38pm

the outputs are always bound to the stream - so, unfortunately, you need to check your streams if one of the outputs is configured on them.

Tafsir_Alam · October 2, 2018, 8:27pm

@jan Disabled all the outputs but still no luck. Backlog is still very high and also process buffer and output buffer utilization is 100%

jan · October 2, 2018, 8:28pm

you might need to restart Graylog after you have disabled the outputs. (just try it with a single node)

Tafsir_Alam · October 2, 2018, 8:37pm

still no luck @jan

But after restart node, I found something in the log.

2018-10-02T20:36:21.580Z WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input BeatsInput{title=Beats - DmsBatchJobs ( Extra Syncer ), type=org.graylog.plugins.beats.BeatsInput, nodeId=hj3c02f5-9878-45b0-a788-8bfe9c9223d2} should be 2097152 but is 212992.
2018-10-02T20:36:21.580Z WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input BeatsInput{title=Beats (Tc's & docker01-04 ), type=org.graylog.plugins.beats.BeatsInput, nodeId=null} should be 2097152 but is 212992.
2018-10-02T20:36:21.580Z WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input GELFTCPInput{title=Gelf Tcp (Dms01, Dms02), type=org.graylog2.inputs.gelf.tcp.GELFTCPInput, nodeId=hj3c02f5-9878-45b0-a788-8bfe9c9223d2} should be 1048576 but is 212992.

What does the number means…?

jan · October 3, 2018, 12:53am

that shows that the messages are oversized for the configured receive buffers. So you need to raise the input size of this named inputs to get the bigger messages.

Tafsir_Alam · October 3, 2018, 1:19am

How can we raise the input size @jan
Sorry, I am little bit confused that’s why I am asking. Also how can we select exact input name.

jan · October 3, 2018, 3:05am

just read the message - two beats inputs and one gelf TCP input. Open those inputs and you can edit the receive buffer size.

Tafsir_Alam · October 3, 2018, 10:28am

@jan

It’s already at 2097152. But why the logs say that it should be 2097152 but is 212992.

rtanay · October 4, 2018, 8:36pm

Taking a guess here, but considering that all three of those buffers are capped at 212992, it’s likely a system-level setting capping TCP buffer sizes to that limit.

If that is the case, the way to change that limit is going to vary based on your OS, the service manager you’re using to run Graylog, etc.

system · October 18, 2018, 8:36pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Championing Graylog and need performance advice Graylog Central (peer support)	10	4149	September 14, 2017
Incoming is good but outgoing is very slow Graylog Central (peer support)	14	3256	October 5, 2018
Process Buffer filling very fast during peak hours Graylog Central (peer support)	19	9742	March 30, 2019
Process Buffer - Output Buffer Full Graylog Central (peer support)	5	6730	September 22, 2020
Backlog on GL Nodes Graylog Central (peer support) pipeline-rules , route-to-streampl	7	1160	July 4, 2019

Buffer utilization is 100% for all nodes having backlog

Related topics