Only 1 of 3 nodes shows gl configuration after upgrade

Hello,

After more than a year of runing 3 node graylog cluster with mongodb 3 node replica, I have what seems like mongodb (or graylog) problem.
graylog is running on phsical machine on Oracle Linux os.
I was running 3.2.4 and did upgrade to latest version on nodes 1 and 2. After that, I lost all configuration on upgraded nodes. It is possible, that mogodb replica was not in ideal state before upgrade, but no such ewrror was seen.
Log files when starting graylog on those nodes 1 and 2 dont show any Errors, even no warnings. But all config is empty, no Streams, Inputs, Users. But it shows me 2 nodes in cluster on GUI-nodes, node 1 and node 2.
On node 3 in GUI-nodes it shows only node 3.

Funniyl it shows elasticsearch is ok and it even shows very few message from All msgs Stream.

Luckily I still have the one node running 3.2.4 and there all config is still OK. That node 3 was not cluster master, so I had to restart it and change it to master . It went well, after restat it still has whole config.

I suspect there is probabyl some problewm in the mongodb config. All 3 nodes are calling mongodb replicaset:

(I changed IPs a bit from real ones)

mongodb_uri = mongodb://192.158.20.100/graylog,192.158.20.101/graylog,192.158.20.102/graylog?replicaSet=reproduk

I tried to so some mongodb investigation. If I log to node1 it shows me these 3 dbs:

reproduk:PRIMARY> show dbs
graylog      0.029GB
graylog,192  0.002GB
local        0.312GB

Especially this entry graylog,192 is vers suspectful to me, I havent noticed it before.

If I run the same command on the only node still runing ok, node3, I get ERROR:

reproduk:SECONDARY> show dbs
2020-09-21T14:27:51.126+0200 E QUERY    [thread1] Error: listDatabases failed:{ "ok" : 0, "errmsg" : "not master and slaveOk=false", "code" : 13435 } :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
Mongo.prototype.getDBs@src/mongo/shell/mongo.js:62:1
shellHelper.show@src/mongo/shell/utils.js:769:19
shellHelper@src/mongo/shell/utils.js:659:15
@(shellhelp2):1:1


But if I run commands like rs.conf() or rs.status() I get practicall the same working result on both node1 and node 3:


reproduk:SECONDARY> rs.status()
{
        "set" : "reproduk",
        "date" : ISODate("2020-09-21T12:37:04.748Z"),
        "myState" : 2,
        "term" : NumberLong(65),
        "syncingTo" : "192.158.20.100:27017",
        "heartbeatIntervalMillis" : NumberLong(2000),
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.158.20.100:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 2625358,
                        "optime" : {
                                "ts" : Timestamp(1600691823, 20),
                                "t" : NumberLong(65)
                        },
                        "optimeDate" : ISODate("2020-09-21T12:37:03Z"),
                        "lastHeartbeat" : ISODate("2020-09-21T12:37:03.658Z"),
                        "lastHeartbeatRecv" : ISODate("2020-09-21T12:37:03.128Z"),
                        "pingMs" : NumberLong(0),
      ...

Any pointers how could I continue my debugging ?

Maybe deleting this collection graylog,192 ?

I am a little cautios with any work in mongodb, because I would not like to make things wors on the only runnig node node3. I diid several mongodb backups and also Contentpack “backup” from node3 to have graylog config saved.

Thanks

I know very little about how MongoDB hangs together but my one node system has this:

> show dbs
admin    0.078GB
config   0.078GB
graylog  0.203GB
local    0.078GB
1 Like

Thanks for the answer Jon.

I also got some answer form mogodb community and it solved my problem.
Apparently the mongodb_uri that I was using for years is wrong and it should not work :slight_smile: . Now I tried on non working node with new mongodb_uri and it connected right after restat to the cluster, all seem s to be working…

I would have to check and test with some older version, but it could be that a bit older veriosns of graylog somehowe knew how to correctly read previous “not correct” mongodb_uri.

The correct mogodb:uri should be:


mongodb_uri = mongodb://192.158.20.100:27017,192.158.20.101:27017,192.158.20.102:27017/graylog?replicaSet=reproduk&retryWrites=true&w=majority
1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.