Multi-node Podman Cluster

1. Describe your incident:
I am attempting to augment a 2 node setup into a 6 node Graylog/Graylog DataNode environment. Initially, I had 2 nodes with Graylog, Graylog Datanode, and MongoDB on 1 node and Graylog and Graylog Datanode installed on the other. I’ve added 3 dedicated nodes for Graylog DataNode and removed Graylog Datanode from the nodes hosting Graylog. I then added another Graylog host. The final state is 3 nodes with Graylog DataNode, 3 nodes with Graylog and 1 of those nodes has MongoDB on it.

All of this runs in Podman with a bridge network.

I am attempting to migrate from a singular MongoDB node to a MongoDB replicaset for configuration redundancy. I am initially attempting to create this MongoDB replicaset without authentication (this isn’t a publicly available node, is in a separated network within a separated network, and has host firewalls limiting inbound communication) to make the initial migration easier.

I’ve built the MongoDB replicaset and can confirm that the replicaset reports healthy and the primary and secondaries are elected without issue.

My procedure for migration is as follows:

  1. Shut down all Graylog and Graylog Datanode components, leaving MongoDB running
  2. Connect to MongoDB container, perform a mongodump
  3. Transfer mongodump to primary MongoDB node and perform a mongorestore with --drop and --preserveUUID specified
  4. Give everything a few minutes to synchronize between the 3 nodes and check rs.status() to validate everything is still healthy. Only about 50 MB worth of data, so not a ton to sync between the 3 nodes.
  5. Update the mongodburi on the Graylog and Graylog Datanode nodes:

mongodb://$FQDN hostname1$:27017,$FQDN hostname2$:27017,$FDN hostname3$:27017/graylog?replicaset=$REPLICASET NAME$

  1. Bring all Graylog and Graylog Datanode components back online

What I find is that the Graylog Datanode components come back online without issue, the logs report that all of these components reach a healthy and AVAILABLE state but the Graylog components never come online. They report being able to connect to MongoDB without issue but then report this error:

2025-07-23T17:32:44.282939859-07:00 00:32:44.279 [main] ERROR org.graylog2.bootstrap.CmdLineTool - Startup error:
2025-07-23T17:32:44.282939859-07:00 java.lang.IllegalStateException: Expected to be healthy after starting. The following services are not running: {FAILED=[PreflightJerseyService [FAILED]]}
2025-07-23T17:32:44.282939859-07:00 at com.google.common.util.concurrent.ServiceManager$ServiceManagerState.checkHealthy(ServiceManager.java:764) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at com.google.common.util.concurrent.ServiceManager$ServiceManagerState.awaitHealthy(ServiceManager.java:585) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at com.google.common.util.concurrent.ServiceManager.awaitHealthy(ServiceManager.java:299) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at org.graylog2.bootstrap.ServerBootstrap.doRunWithPreflightInjector(ServerBootstrap.java:221) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at org.graylog2.bootstrap.ServerBootstrap.runPreflightWeb(ServerBootstrap.java:197) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at org.graylog2.bootstrap.ServerBootstrap.runPreFlightChecks(ServerBootstrap.java:177) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at org.graylog2.bootstrap.ServerBootstrap.beforeInjectorCreation(ServerBootstrap.java:150) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at org.graylog2.bootstrap.CmdLineTool.doRun(CmdLineTool.java:362) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at org.graylog2.bootstrap.CmdLineTool.run(CmdLineTool.java:287) [graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at org.graylog2.bootstrap.Main.main(Main.java:57) [graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 Suppressed: com.google.common.util.concurrent.ServiceManager$FailedService: PreflightJerseyService [FAILED]
2025-07-23T17:32:44.282939859-07:00 Caused by: java.lang.IllegalStateException: Initial password should be automatically present in the DB, this is an inconsistent state. Please report the problem to Graylog.
2025-07-23T17:32:44.282939859-07:00 at org.graylog2.bootstrap.preflight.PreflightConfigServiceImpl.lambda$getPreflightPassword$2(PreflightConfigServiceImpl.java:69) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at java.base/java.util.Optional.orElseThrow(Unknown Source) ~[?:?]
2025-07-23T17:32:44.282939859-07:00 at org.graylog2.bootstrap.preflight.PreflightConfigServiceImpl.getPreflightPassword(PreflightConfigServiceImpl.java:69) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at org.graylog2.bootstrap.preflight.PreflightJerseyService.createBasicAuthFilter(PreflightJerseyService.java:202) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at org.graylog2.bootstrap.preflight.PreflightJerseyService.buildResourceConfig(PreflightJerseyService.java:196) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at org.graylog2.bootstrap.preflight.PreflightJerseyService.setUp(PreflightJerseyService.java:222) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at org.graylog2.bootstrap.preflight.PreflightJerseyService.startUpApi(PreflightJerseyService.java:125) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at org.graylog2.bootstrap.preflight.PreflightJerseyService.startUp(PreflightJerseyService.java:95) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at com.google.common.util.concurrent.AbstractIdleService$DelegateService.lambda$doStart$0(AbstractIdleService.java:63) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at com.google.common.util.concurrent.Callables.lambda$threadRenaming$1(Callables.java:104) ~[graylog.jar:?]
2025-07-23T17:32:44.282939859-07:00 at java.base/java.lang.Thread.run(Unknown Source) ~[?:?]
2025-07-23T17:32:44.282995853-07:00 Exception in thread “main” 2025-07-23T17:32:44.283068551-07:00 java.lang.IllegalStateException: Expected to be healthy after starting. The following services are not running: {FAILED=[PreflightJerseyService [FAILED]]}
2025-07-23T17:32:44.283077334-07:00 at com.google.common.util.concurrent.ServiceManager$ServiceManagerState.checkHealthy(ServiceManager.java:764)
2025-07-23T17:32:44.283096651-07:00 at com.google.common.util.concurrent.ServiceManager$ServiceManagerState.awaitHealthy(ServiceManager.java:585)
2025-07-23T17:32:44.283102727-07:00 at com.google.common.util.concurrent.ServiceManager.awaitHealthy(ServiceManager.java:299)
2025-07-23T17:32:44.283109972-07:00 at org.graylog2.bootstrap.ServerBootstrap.doRunWithPreflightInjector(ServerBootstrap.java:221)
2025-07-23T17:32:44.283118214-07:00 at org.graylog2.bootstrap.ServerBootstrap.runPreflightWeb(ServerBootstrap.java:197)
2025-07-23T17:32:44.283134228-07:00 at org.graylog2.bootstrap.ServerBootstrap.runPreFlightChecks(ServerBootstrap.java:177)
2025-07-23T17:32:44.283140196-07:00 at org.graylog2.bootstrap.ServerBootstrap.beforeInjectorCreation(ServerBootstrap.java:150)
2025-07-23T17:32:44.283145917-07:00 at org.graylog2.bootstrap.CmdLineTool.doRun(CmdLineTool.java:362)
2025-07-23T17:32:44.283153164-07:00 at org.graylog2.bootstrap.CmdLineTool.run(CmdLineTool.java:287)
2025-07-23T17:32:44.283160402-07:00 at org.graylog2.bootstrap.Main.main(Main.java:57)
2025-07-23T17:32:44.283189365-07:00 Suppressed: com.google.common.util.concurrent.ServiceManager$FailedService: PreflightJerseyService [FAILED]
2025-07-23T17:32:44.283195446-07:00 Caused by: java.lang.IllegalStateException: Initial password should be automatically present in the DB, this is an inconsistent state. Please report the problem to Graylog.
2025-07-23T17:32:44.283210719-07:00 at org.graylog2.bootstrap.preflight.PreflightConfigServiceImpl.lambda$getPreflightPassword$2(PreflightConfigServiceImpl.java:69)
2025-07-23T17:32:44.283216678-07:00 at java.base/java.util.Optional.orElseThrow(Unknown Source)
2025-07-23T17:32:44.283254663-07:00 at org.graylog2.bootstrap.preflight.PreflightConfigServiceImpl.getPreflightPassword(PreflightConfigServiceImpl.java:69)
2025-07-23T17:32:44.283254663-07:00 at org.graylog2.bootstrap.preflight.PreflightJerseyService.createBasicAuthFilter(PreflightJerseyService.java:202)
2025-07-23T17:32:44.283275263-07:00 at org.graylog2.bootstrap.preflight.PreflightJerseyService.buildResourceConfig(PreflightJerseyService.java:196)
2025-07-23T17:32:44.283279055-07:00 at org.graylog2.bootstrap.preflight.PreflightJerseyService.setUp(PreflightJerseyService.java:222)
2025-07-23T17:32:44.283289424-07:00 at org.graylog2.bootstrap.preflight.PreflightJerseyService.startUpApi(PreflightJerseyService.java:125)
2025-07-23T17:32:44.283304392-07:00 at org.graylog2.bootstrap.preflight.PreflightJerseyService.startUp(PreflightJerseyService.java:95)
2025-07-23T17:32:44.283310321-07:00 at com.google.common.util.concurrent.AbstractIdleService$DelegateService.lambda$doStart$0(AbstractIdleService.java:63)
2025-07-23T17:32:44.283324022-07:00 at com.google.common.util.concurrent.Callables.lambda$threadRenaming$1(Callables.java:104)
2025-07-23T17:32:44.283329194-07:00 at java.base/java.lang.Thread.run(Unknown Source)

My googling of the error shows that the probable issue is a connectivity issue but I can’t seem to figure out where that connectivity is failing. I can see successful connections in the MongoDB logs on all nodes, plus the MongoDB containers are on the same hosts as the Graylog containers and they all report successful connections.

This only occurs when using the MongoDB replicaset, when I revert to the standalone MongoDB node, everything comes back online without issue.

I also see that it mentions “initial password should be automatically present in the DB, this is an inconsistent state” in the error message, but my google-fu was failing me there. It has me wondering if my backup and restore method for the MongoDB data is incorrect and would appreciate any thoughts on that.

2. Describe your environment:

  • OS Information:
    Oracle Linux 9, fully patched. Running Podman v5.4.0
  • Package Version:
    Running the newest Graylog versions: v6.3.1
  • Service logs, configurations, and environment variables:

MongoDB URI used by all Graylog components:

mongodb://$FQDN hostname1$:27017,$FQDN hostname2$:27017,$FDN hostname3$:27017/graylog?replicaset=$REPLICASET NAME$

Standalone MongoDB node URI used before replicaset migration:

mongodb://$FQDN hostname$:27017/graylog

Both the root SHA2 password and password secret are specified in the configuration file.

3. What steps have you already taken to try and solve the problem?

I’ve played around with the podman DNS, DNS Search, and Network Alias properties in case there was internal validation within Graylog that was failing (which I noticed that the Graylog nodes were removing one of the MongoDB nodes because it’s DNS name didn’t match the FQDN). I used the MongoDB documentation to build the mongodb_uri value and referred to this documentation for my backup and restore procedure:

I’m specifically looking to get rid of the node that contains the standalone MongoDB container and would prefer to avoid any operations that change it from standalone to replicaset ready so as not to destroy my fail safe procedure.

4. How can the community help?

Would appreciate any clarity on the MongoDB backup and restore in case there is a special part of the procedure I’m missing. And any insights as to the error I’m receiving on my Graylog nodes would be very helpful.

Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.