Graylog doesn't process messages from journal

entrop · March 11, 2020, 1:42pm

Hi!

After upgrading from 3.0 to 3.2 I face issues with random nodes in my graylog cluster. Some nodes stop processing messages flushing them to the journal.

Some words about the setup:
56 graylog nodes sitting behind 4 lvs balancer are writing logs to a huge elasticsearch cluster (about 200tb of data, 360 data nodes).

In most cases the nodes can be “repaired” by just restarting them but this is an ugly solutions which isn’t reliable at all.

I had a similar issue with previous versions solved by increasing the “http.max_content_length” setting in elasticsearch to 500mb.

The only lines related to the problem I was able to discover:
debug log fragment: https://pastebin.com/6RDwsLwC
graylog configs: https://pastebin.com/zftVSvMy

I host logs/configs on paste bin due to upload limitations here.

entrop · March 11, 2020, 1:43pm

jmap (part 1): https://pastebin.com/c7QjXD3D
jmap (part 2): https://pastebin.com/TswQ8L1m

entrop · March 11, 2020, 1:59pm

And this is the java profile from the problematic node:
profile: https://sendeyo.com/up/d/c63d75f7ad

system · March 25, 2020, 2:00pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog stops processing messages seemingly at random times Graylog Central (peer support)	7	1889	June 19, 2020
Graylog Cluster, Buffer process 100% stop process messages Graylog Central (peer support)	22	17018	November 28, 2018
Journal Message processing Graylog Central (peer support)	2	942	June 24, 2017
Journal Contains Unprocessed Messages Graylog Central (peer support)	5	14057	August 23, 2017
Process buffer and output buffer are full. Journal over the allowed size. No messages written to elastic Graylog Central (peer support) elastic	3	33	October 18, 2024

Graylog doesn't process messages from journal

Related topics