Graylog Process Buffer Full

Bashere1 · July 5, 2017, 6:22pm

I have been running a decent size Graylog instance 25,000 msg/s for the last few months without incident. As of last night we started to experience the process buffers filling up. Eventually it appears that the deflector dies and stops processing messages all together. The elasticsearch cluster is green and doesn’t appear to be having performance issues. It doesn’t appear that the messages are even getting to the output buffer. Messages are as a result stacking up in the journal.

I tried default settings and the following to help with the process buffers filling up.

processbuffer_processors = 8
output_batch_size = 100
ring_size = 262144

Any guidance how how I can dig in to see what is causing the process buffers to fill up would be helpful. The logs are not pointing me anywhere currently.

Thanks,

jochen · July 5, 2017, 8:21pm

Please upload and share the logs of your Graylog and Elasticsearch nodes.

http://docs.graylog.org/en/2.2/pages/configuration/file_location.html

Bashere1 · July 5, 2017, 8:32pm

I appreciate the reply. I was able to figure out what the issue. There was a bad extractor causing the processor buffer pool to fill up and crash. The logs did not indicate an issue from what I was able to see.

I do have a follow up question though.

Are there metrics exposed through the API for extractor performance?
Why can a single bad extractor brick an entire Graylog cluster? It almost seems like this is a bug that should be fixed.

Thanks,

jochen · July 6, 2017, 8:04am

Yes, you can also send these metrics to various other systems.

If it was a cluster, the other Graylog nodes would still have worked.

Anyway, we didn’t add timeouts to the extractors until now on purpose, because sometimes complex (and thus long running) extractions are necessary. It’s up to you to monitor the health of your Graylog nodes.

Bashere1 · July 6, 2017, 2:19pm

Awesome, thank you for the link to the plugin.

If it was a cluster, the other Graylog nodes would still have worked.

I would have thought so as well. However, since it was a global input that the extractor was configured on all of the Graylog nodes were affected. The filebeat agents that are sending the data are auto load balanced across all of the Graylog nodes.

Anyway, we didn’t add timeouts to the extractors until now on purpose, because sometimes complex (and thus long running) extractions are necessary.

Is there specific version of Graylog I need to be running in order to leverage the timeout setting? Is there any documentation on this timeout setting?

jan · July 6, 2017, 3:37pm

Is there specific version of Graylog I need to be running in order to leverage the timeout setting? Is there any documentation on this timeout setting?

that is not build and is not in planing to build.

Bashere1 · July 7, 2017, 6:53pm

that is not build and is not in planing to build.

Are you saying this is not planned or not in a current build?

I would like to see some way prevent this in the future besides alerts setup around extractor metrics.

system · July 21, 2017, 6:54pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Process and output buffers are full Graylog Central (peer support)	19	9414	November 30, 2020
Process buffer repeatedly filling up until restart Graylog Central (peer support) pipeline-rules	9	2778	December 24, 2019
Process buffer gets full with the Grok pattern Extractor Graylog Central (peer support)	2	1296	April 9, 2019
Processors buffer configuration, process buffer 100% Graylog Central (peer support)	7	12713	June 22, 2018
Process buffer filling Graylog Central (peer support)	6	820	April 15, 2021

Graylog Process Buffer Full

Related topics