Graylog servers with full journal

Cato · October 12, 2020, 7:41am

Hi,
I tried to post about this a while ago and got no answers so i’ll try again.
We have a Graylog cluster with physical servers (quite good specs) that gets full journals every 1 to 3 days and makes the entire cluster slow.
We have tried to expand it from 6 to 9 nodes and also tried the latest Java 8 but it still happens at the same rate as before.
The servers are Debian10 with SSD’s, bonded 10Gb NICS and Java8u251. Backend is 50 Elastic nodes also with Debian10 and bonded 10Gb NIC’s.
Graylog version 3.2.5, MongoDB 4.0.17 and Elastic 6.8.9.

The solution for this right now are Nagios checks that restart Graylog when the journal reaches 500K unprocessed messages and it solves about 95% of the problem (sometimes the check fails when it can’t get the value of the journal from the API)

Anyone have any clues of what causes this and how to fix it ?

Ponet · October 12, 2020, 1:26pm

First thing I’d do is check the message processing buffers, what is the utilisation like across them all?

Cato · October 12, 2020, 1:37pm

They are loadbalanced via an F5 so it’s a very even utilization (until the journal on one of the servers suddenly starts growing)

system · October 26, 2020, 1:37pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog journal getting full Graylog Central (peer support)	29	13388	September 23, 2019
Journal keeps growing Graylog Central (peer support)	8	2025	March 21, 2018
Journal Message processing Graylog Central (peer support)	2	943	June 24, 2017
Graylogs unread journal messages increasong Graylog Central (peer support)	2	260	January 25, 2021
Graylog2 start again or not? (Solved) Graylog Central (peer support)	9	1050	June 21, 2017

Graylog servers with full journal

Related topics