Graylog Nodes "Drop" After Bringing Up Node on WAN

Cameron · August 16, 2022, 5:21pm

Has anyone experienced something like this, and what did you do to resolve?

1. Describe your incident:

Bringing up additional Graylog nodes at a second physical location causes all nodes to “drop” from the cluster entirely and processing halts. Stopping the graylog service on the new node allows the others to resume.

2. Describe your environment:

Three node cluster at Site1 and attempting to bring up nodes at Site2.
Two physical network locations separated by SD-WAN.
Two subnets under the same broadcast domain.

Package Version:

4.3.5+32fa802 (Debian 11.0.16 on Linux 5.10.0-16-amd64)

tmacgbay · August 17, 2022, 2:43pm

Hello && welcome!

Not a lot to go on other than the Graylog version… Here are some tips on how to make your question clearer here and here.

Have you looked in the Graylog logs? What are the server.conf files? The tips I posted show how to get the server.conf data with out all the comments… make sure to obfuscate where needed… they also show how to use the </> forum tool when posting code/logs to make it easily readable…

gsmith · August 17, 2022, 10:28pm

Hello and Welcome @Cameron

Adding on to what @tmacgbay Suggested.

By chance does both Elasticsearch clusters have the same name?
Also do you have any discoveries enable in GL config file?

Cameron · August 18, 2022, 2:46pm

I’m happy to report that I’ve found the issue and everything appears to be working now!

It turns out that our WAN link was just laggy enough to cause a timeout for the Master Node. After a bit of digging I found the stale_master_timeout setting in server.conf bumped it up a few seconds instead of two, and bam - problem solved!

Hopefully this post can help someone else in the future!

system · September 1, 2022, 2:47pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Connection to Elastic Stops Graylog Central (peer support)	7	722	February 22, 2018
Problem with Graylog cluster Graylog Central (peer support)	4	618	December 15, 2020
Graylog 2.2 Cluster Issue Graylog Central (peer support)	6	4339	April 10, 2017
Graylog upgrade on elasticsearch 2.2 Graylog Central (peer support)	2	484	March 9, 2018
Graylog 3.1.3 - Master node flap / NodePingThread Graylog Central (peer support)	2	1728	December 20, 2019

Graylog Nodes "Drop" After Bringing Up Node on WAN

Related topics