I’m currently setting up a new Graylog 2.2 Cluster with four nodes. One node is bringing trouble to the whole cluster.
Node 1 is the master
Node 2 brings the trouble if he is running
Node 3 works fine
Node 4 works fine
If I start Node 2 the cluster is loosing is master server or at least it cluster state because I access the webinterface only on the master node.
On the overview page I get this message:
“There was no master Graylog server node detected in the cluster.”
In the logfile of the master node I see the following message every second:
2017-04-05T10:25:57.802+02:00 WARN [NodePingThread] Did not find meta info of this node. Re-registering.
In the logfile of the second node is no error or anything suspect to see.
I tried to set a new node-id on node 2, to register it as a new node, but this didn’t helped.
I checked the configs three times. The master flag is only set on node 1.
I Posted the logfiles on Gist - Graylog Logfiles
How did you install and configure these Graylog nodes?
Are all nodes using the same MongoDB database?
Are all node IDs of the Graylog nodes unique?
I installed the RPM Packages (graylog-server-2.2.0-11) on RHEL7 Systems.
The MongoDB databaes is the same on all four nodes. It is a replication set, where alle nodes are running without any problem
Yes, ever Graylog node has its unique id, which is a uuid
I diffed all configs of the four nodes, the only differences are is_master, which is only on node 1 true and the second difference is elasticsearch_network_host which is the ip of the server.
Please upgrade to the latest stable release in the 2.2.x line (Graylog 2.2.3) to rule out any bugs which have been fixed since Graylog 2.2.0.
Additionally, please post the complete logs of all Graylog nodes.
After the Update I saw a few log messages " [NodePingThread] Did not find meta info of this node. Re-registering." this time on node 2, but rest of the cluster was ok. I did a clean cluster restart (all stop & all start) and now it is working normally.
I will have an eye on it for the next days.
Thanks so far.
After the Weekend I had the same messages on my node2. But i found the solution.
If you get messages like:
[NodePingThread] Did not find meta info of this node. Re-registering.
Check your server clock! The time on all systems should be in sync!
Maybe this helps some other, too.