Buffer utilization is 100% for all nodes having backlog


(Tafsir) #1

Hi All,

Heavy backlog on more than one nodes. some nodes having more than 2M backlog and some having more than 10M. When I click on nodes I found process and output buffer is 100% utilized. Can we do any modifications to solve this? Already read the thread regarding this issue but not able to understand.

Graylog version - 2.4.6
Elasticsearch - 5.6


(Jan Doberstein) #2

the simple answer:

check your outputs (if you have any) and give elasticsearch more resources.


(Tafsir) #3

@jan

Already having high resources.


(Tafsir) #4

@jan which outputs…?


(Tafsir) #5

@jan, How can we configure buffer…?


(Jan Doberstein) #6

the question is - do you have some configured?

Go to system/outputs and check. If not you might want to look at the batch_size

And the buffer workers

The total number of processors should not be more than the available CPUs and the batch size should be like the median number of messages you have in the flush_period.


(Tafsir) #7

@jan Thanks for the reply.

In system/outputs I found some of the entries but what I have to do?


(Jan Doberstein) #8

Graylog is processing the all outputs in sequence and the last is the output to elasticsearch.

If you have outputs configured and they are not very responsive - means slow this will slow down your processing.
If possible disable all outputs and see if this helps to speed your system up.


(Tafsir) #9

@jan
sorry but not having disable button.


(Jan Doberstein) #10

the outputs are always bound to the stream - so, unfortunately, you need to check your streams if one of the outputs is configured on them.


(Tafsir) #11

@jan Disabled all the outputs but still no luck. Backlog is still very high and also process buffer and output buffer utilization is 100%


(Jan Doberstein) #12

you might need to restart Graylog after you have disabled the outputs. (just try it with a single node)


(Tafsir) #13

still no luck @jan

But after restart node, I found something in the log.

2018-10-02T20:36:21.580Z WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input BeatsInput{title=Beats - DmsBatchJobs ( Extra Syncer ), type=org.graylog.plugins.beats.BeatsInput, nodeId=hj3c02f5-9878-45b0-a788-8bfe9c9223d2} should be 2097152 but is 212992.
2018-10-02T20:36:21.580Z WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input BeatsInput{title=Beats (Tc's & docker01-04 ), type=org.graylog.plugins.beats.BeatsInput, nodeId=null} should be 2097152 but is 212992.
2018-10-02T20:36:21.580Z WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input GELFTCPInput{title=Gelf Tcp (Dms01, Dms02), type=org.graylog2.inputs.gelf.tcp.GELFTCPInput, nodeId=hj3c02f5-9878-45b0-a788-8bfe9c9223d2} should be 1048576 but is 212992.

What does the number means…?


(Jan Doberstein) #14

that shows that the messages are oversized for the configured receive buffers. So you need to raise the input size of this named inputs to get the bigger messages.


(Tafsir) #16

How can we raise the input size @jan
Sorry, I am little bit confused that’s why I am asking. Also how can we select exact input name.


(Jan Doberstein) #17

just read the message - two beats inputs and one gelf TCP input. Open those inputs and you can edit the receive buffer size.


(Tafsir) #18

@jan

It’s already at 2097152. But why the logs say that it should be 2097152 but is 212992.


(Ryan Tanay) #19

Taking a guess here, but considering that all three of those buffers are capped at 212992, it’s likely a system-level setting capping TCP buffer sizes to that limit.

If that is the case, the way to change that limit is going to vary based on your OS, the service manager you’re using to run Graylog, etc.


(system) #20

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.