Problems with mongodb replication

micsnare · February 6, 2020, 8:43am

Hi everyone,

we’re currently running a Graylog cluster (3 master/graylog-server/mongo-db nodes und 3 elasticsearch datanodes) and facing an issue with the mongodb replica set.
One of the mongod services on one particular host is continuously dying, and having checked the log we’re seeing the following message:
Timestamp(1579543862, 2) is before their earliest timestamp: Timestamp(1580905960, 11) for 1min until: 2020-02-05T15:50:29.644+0100",
“2020-02-05T15:49:29.644+0100 I REPL [replication-0] could not find member to sync from”,
“2020-02-05T15:50:29.655+0100 I REPL [rsBackgroundSync] sync source candidate: blabla.bla.com:27017”,
“2020-02-05T15:50:29.655+0100 I REPL [replication-0] We are too stale to use blabla.bla.com:27017 as a sync source. Blacklisting this sync source because our last fetched timestamp: Timestamp(1579543862, 2) is before their earliest timestamp: Timestamp(1580905978, 28) for 1min until: 2020-02-05T15:51:29.655+0100”,
“2020-02-05T15:50:29.655+0100 I REPL [replication-0] sync source candidate: blabla.bla.com:27017”,
“2020-02-05T15:50:29.655+0100 I REPL [replication-0] We are too stale to use blabla.bla.com:27017 as a sync source. Blacklisting this sync source because our last fetched timestamp: Timestamp(1579543862, 2) is before their earliest timestamp: Timestamp(1580905960, 11) for 1min until: 2020-02-05T15:51:29.655+0100”,
“2020-02-05T15:50:29.655+0100 I REPL [replication-0] could not find member to sync from”

this doesn’t seem to look too good.
has anyone ever had a similar problem with mongodb replica sets and how to fix it? we have found some posts on stackoverflow that indicated it would need to be re-installed/setup again…

Thankful for any advice here,
thanks,
theresa

jan · February 6, 2020, 11:18am

he @micsnare

Did you have NTPD running on all servers that keep the time in sync? I have often seen this when the time differs on members of the replica set.

In addition when the network connection is flaky or saturated, but mostly time was the problem. You might want to check that.

micsnare · February 6, 2020, 12:18pm

Servus @jan

Thanks for your immediate reply. I think you’re right, for some reason the ntp service wasn’t running, and therefore the time didn’t get synced.
However, I have now started the ntpd on all nodes, but it seems mongodb is still in a recovering state.
In the mongo shell it says “recovering”

rs-graylog:RECOVERING>

                "2020-02-06T13:12:01.534+0100 I NETWORK  [listener] connection accepted from 10.17.7.1:37294 #776 (10 connections now open)",
            "2020-02-06T13:12:01.534+0100 I NETWORK  [conn776] end connection 10.17.7.1:37294 (9 connections now open)",
            "2020-02-06T13:12:29.558+0100 I NETWORK  [listener] connection accepted from 10.18.7.1:37498 #777 (10 connections now open)",
            "2020-02-06T13:12:29.560+0100 I NETWORK  [conn777] end connection 10.18.7.1:37498 (9 connections now open)",
            "2020-02-06T13:12:42.821+0100 I REPL     [rsBackgroundSync] sync source candidate: server3.blabla.com:27017",
            "2020-02-06T13:12:42.822+0100 I REPL     [replication-0] We are too stale to use server3.blabla.com:27017 as a sync source. Blacklisting this sync source because our last fetched timestamp: Timestamp(1579543862, 2) is before their earliest timestamp: Timestamp(1580907197, 1889) for 1min until: 2020-02-06T13:13:42.822+0100",
            "2020-02-06T13:12:42.822+0100 I REPL     [replication-0] sync source candidate: server2.blabla.com:27017",
            "2020-02-06T13:12:42.822+0100 I REPL     [replication-0] We are too stale to use server2.blabla.com:27017 as a sync source. Blacklisting this sync source because our last fetched timestamp: Timestamp(1579543862, 2) is before their earliest timestamp: Timestamp(1580906717, 975) for 1min until: 2020-02-06T13:13:42.822+0100",
            "2020-02-06T13:12:42.822+0100 I REPL     [replication-0] could not find member to sync from",
            "2020-02-06T13:13:01.528+0100 I NETWORK  [listener] connection accepted from 10.17.7.1:57436 #778 (10 connections now open)",
            "2020-02-06T13:13:01.528+0100 I NETWORK  [conn778] end connection 10.17.7.1:57436 (9 connections now open)",
            "2020-02-06T13:13:29.556+0100 I NETWORK  [listener] connection accepted from 10.18.7.1:57688 #779 (10 connections now open)",
            "2020-02-06T13:13:29.556+0100 I NETWORK  [conn779] end connection 10.18.7.1:57688 (9 connections now open)",
            "2020-02-06T13:13:42.831+0100 I REPL     [rsBackgroundSync] sync source candidate: server3.blabla.com:27017",
            "2020-02-06T13:13:42.831+0100 I REPL     [replication-0] We are too stale to use server3.blabla.com:27017 as a sync source. Blacklisting this sync source because our last fetched timestamp: Timestamp(1579543862, 2) is before their earliest timestamp: Timestamp(1580907197, 1889) for 1min until: 2020-02-06T13:14:42.831+0100",
            "2020-02-06T13:13:42.831+0100 I REPL     [replication-0] sync source candidate: server2.blabla.com:27017",
            "2020-02-06T13:13:42.832+0100 I REPL     [replication-0] We are too stale to use server2.blabla.com:27017 as a sync source. Blacklisting this sync source because our last fetched timestamp: Timestamp(1579543862, 2) is before their earliest timestamp: Timestamp(1580906717, 975) for 1min until: 2020-02-06T13:14:42.832+0100",
            "2020-02-06T13:13:42.832+0100 I REPL     [replication-0] could not find member to sync from",
            "2020-02-06T13:14:01.528+0100 I NETWORK  [listener] connection accepted from 10.17.7.1:49416 #780 (10 connections now open)",
            "2020-02-06T13:14:01.528+0100 I NETWORK  [conn780] end connection 10.17.7.1:49416 (9 connections now open)",
            "2020-02-06T13:14:03.631+0100 I NETWORK  [LogicalSessionCacheRefresh] Starting new replica set monitor for rs-graylog/server1.blabla.com:27017,server2.blabla.com:27017,server3.blabla.com:27017",
            "2020-02-06T13:14:03.634+0100 I NETWORK  [LogicalSessionCacheRefresh] Starting new replica set monitor for rs-graylog/server1.blabla.com:27017,server2.blabla.com:27017,server3.blabla.com:27017",
            "2020-02-06T13:14:03.634+0100 I NETWORK  [LogicalSessionCacheRefresh] Starting new replica set monitor for rs-graylog/server1.blabla.com:27017,server2.blabla.com:27017,server3.blabla.com:27017",
            "2020-02-06T13:14:25.404+0100 I NETWORK  [listener] connection accepted from 127.0.0.1:51316 #781 (10 connections now open)",
            "2020-02-06T13:14:25.405+0100 I NETWORK  [conn781] received client metadata from 127.0.0.1:51316 conn781: { application: { name: \"MongoDB Shell\" }, driver: { name: \"MongoDB Internal Client\", version: \"4.0.12\" }, os: { type: \"Linux\", name: \"Red Hat Enterprise Linux Server release 7.7 (Maipo)\", architecture: \"x86_64\", version: \"Kernel 3.10.0-1062.el7.x86_64\" } }",
            "2020-02-06T13:14:29.555+0100 I NETWORK  [listener] connection accepted from 10.18.7.1:49578 #782 (11 connections now open)",
            "2020-02-06T13:14:29.555+0100 I NETWORK  [conn782] end connection 10.18.7.1:49578 (10 connections now open)"
    ],
    "ok" : 1,
    "operationTime" : Timestamp(1579543862, 2),
    "$clusterTime" : {
            "clusterTime" : Timestamp(1580991270, 7),
            "signature" : {
                    "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                    "keyId" : NumberLong(0)
            }
    }

}

Do you think this is a lost caused, or will it eventually finish recovering?
Would the restart of the mongod process help?

Lg,
theresa

jan · February 7, 2020, 2:20pm

@micsnare

I hope that you already have restarted the server/mongodb.

The adjustmend by ntpd that is done slowly can cause the problems. It is better to stop MongoDB, sync the time (maybe restart the server) and start MongoDB again.

system · February 21, 2020, 2:20pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Master node disappearing intermittently Graylog Central (peer support)	6	880	May 25, 2021
Graylog cluster issue Graylog Central (peer support)	7	1595	June 7, 2021
Timestamp in graylog does not match logs being ingested Graylog Central (peer support) timestamp	4	391	May 31, 2023
Graylog Gluster Node 1 error Graylog Central (peer support)	3	703	September 25, 2018
Graylog.cluster_config high latency in Production stage Graylog Central (peer support)	2	272	August 15, 2022

Problems with mongodb replication

Related topics