Graylog leader node loss

graylog leader node keeps losing from cluster, it is processing logs but sometimes is missing from UI, every 5 minutes alert is triggered

Graylog:
OS is ubuntu 22.04
RAM: 48
CPU: 32
Nodes: 5
Elasticsearch:
RAM: 128
CPU: 16
Nodes: 9

the only error i am getting in graylog-server’s journal is java.net.SocketTimeoutException: Connect timed out. No more errors in any logs

this is the configuration:

http_bind_address = 192.168.133.147:9000

http_publish_uri = http://192.168.133.147:9000/

http_external_uri = http://example-graylog01.example.com/

every node has correct addresses

i have tried to change those threee arguments but nothing would work, ping and telnet is successful from every node towards leader node. what you guys think i am missing?

This graylog cluster is about 1 and a half years old but this error appeared only 2-3 months ago. there are not any errors in elasticsearch nor mongodb rs, i also have tried to ping and telnet from elastic and mongo and those went successful too.

Here are some screenshots:

Hello @dsamadashvili,

At the point you see the error thrown, what are the utilised resources looking like on the leader node?

This can occur when either the leader is overwhelmed or there is an issue with the leader contacting MongoDB.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.