I’ve a graylog cluster with 3 nodes.
I’ve a random issue: Every minutes one of the node (here the master) was lost in System/Nodes and reappears.
In the config, only one node was configured at master role.
Each node-id is unique.
In the dashboard, when we lost the node, we have this message:
There was no master Graylog server node detected in the cluster. (triggered a few seconds ago)
Certain operations of Graylog server require the presence of a master node, but no such master was started. Please ensure that one of your Graylog server nodes contains the setting is_master = true in its configuration and that it is running. Until this is resolved index cycling will not be able to run, which means that the index retention mechanism is also not running, leading to increased index sizes. Certain maintenance functions as well as a variety of web interface pages (e.g. Dashboards) are unavailable.
In the Graylog file log, we have:
WARN [NodePingThread] Did not find meta info of this node. Re-registering.
In the past, this NodePingThread was resolved with ntpd resync, here, not work.
Do you have an idea ?
The impacts are:
- The new logs was store in the disk buffer;
- Stop the process work and we not have outgoing traffic.
When we restart the service on this node, it work for N minutes and the problem reappears…