I’m currently troubleshooting a 3 node cluster, with one node that won’t connect to the cluster.
Running Graylog 4.3.3 on Debian 10. Problems started after upgrading (using apt-get) from 4.2.x to 4.3.3.
Time is synced, ids are unique, firewall rules are validated. I’ve confirmed that all of mongoDB URIs are the same in graylog server config.
I can run ‘mongo --norc --quiet --host=graylognode#:27017 <<< “db.getMongo()”’ across the cluster, from each node connecting to each of the other nodes, and get a successful connection.
At one point, I was fighting Debian’s use of 127.0.1.1 for hostname, tested with 127.0.1.1 in mongodb bindings, currently testing with it removed and 127.0.1.1 out of the hosts file. One thing I’m seeing in the logs on the failed node, it discovers the master node by name first, then seems to discover the master node by IP and messages that the canonical name doesn’t match. Specifically, throws:
Server graylognode1:27017 is no longer a member of the replica set. Removing from client view of cluster.
Just after it discovers by IP.
Last note, on the primary, if I run rs.isMaster() from ‘mongo’ shell (not ‘mongosh’), I get:
graylogreplica01:PRIMARY> rs.isMaster()
{
“hosts” : [
“10.x.y.x(node1 IP):27017”,
“graylognode2:27017”,
“graylognode3:27017”
],
“setName” : “graylogreplica01”,
“setVersion” : 3,
“ismaster” : true,
“secondary” : false,
“primary” : “10.x.y.x(node1 IP)”,
“me” : “10.x.y.x(node1 IP)”,
How do I change the hostname to use the name rather than the IP? I’ve found some references that seem to explain it, but if I try to run rs.conf() against the graylog db, I get "“errmsg” : “not authorized on admin to execute command”