Not starting after upgrade from 3.3 to 4.0

I’ve got graylog to the point of opening the connection to mongodb, and then initializing InputBufferImpl. It however just hangs there and does not move on.

This is a standalone graylog, with AWS ES 7.10, and mongod version info below:

db version v4.0.24
git version: 9df1b3a80f39cf7e7ccd6264a207518426a524f6
OpenSSL version: OpenSSL 1.0.2k-fips  26 Jan 2017
allocator: tcmalloc
modules: none
build environment:
    distmod: amazon2
    distarch: x86_64
    target_arch: x86_64

Logging info:

2021-05-04T14:10:06.888Z DEBUG [OffsetIndex] Loaded index file /var/lib/graylog-server/journal/messagejournal-0/00000000000000000000.index with maxEntries = 131072, maxIndexSize = 1048576, entries = 131072, lastOffset = 0, file position = 1048576
2021-05-04T14:10:06.890Z WARN [Log] Found a corrupted index file, /var/lib/graylog-server/journal/messagejournal-0/00000000000000000000.index, deleting and rebuilding index…
2021-05-04T14:10:06.903Z INFO [Log] Recovering unflushed segment 0 in log messagejournal-0.
2021-05-04T14:10:06.905Z INFO [Log] Completed load of log messagejournal-0 with log end offset 0
2021-05-04T14:10:06.917Z INFO [LogManager] Logs loading complete.
2021-05-04T14:10:06.920Z INFO [KafkaJournal] Initialized Kafka based journal at /var/lib/graylog-server/journal
2021-05-04T14:10:06.942Z INFO [cluster] Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout=‘30000 ms’, maxWaitQueueSize=5000}
2021-05-04T14:10:06.987Z INFO [cluster] Cluster description not yet available. Waiting for 30000 ms before timing out
2021-05-04T14:10:07.015Z INFO [connection] Opened connection [connectionId{localValue:1, serverValue:8}] to localhost:27017
2021-05-04T14:10:07.019Z INFO [cluster] Monitor thread successfully connected to server with description ServerDescription{address=localhost:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 0, 24]}, minWireVersion=0, maxWireVersion=7, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=2781257}
2021-05-04T14:10:07.037Z INFO [connection] Opened connection [connectionId{localValue:2, serverValue:9}] to localhost:27017
2021-05-04T14:10:07.251Z INFO [InputBufferImpl] Initialized InputBufferImpl with ring size <65536> and wait strategy , running 2 parallel message handlers.

@Gabe

Hello and Welcome,

Maybe I can help you with this problem. I have a couple question to ask you.
.
When you stated “Not starting” I take it is Graylog service?
Have you checked all your logs (i.e., ES, MongoDb, GL), anything stating ERROR, WARNING, stopped, etc…? The logs you provided are not showing me why Graylog is not starting.
After restarting Graylog service did you Tail graylog’s log file.

Tail -f /var/log/graylog-server/server.log

Just a brief overview, I’m assuming that Elasticsearch and MongoDb services are active status meaning the services are started and no ERRORs are shown?
What is shown when you execute these following comands.

Systemctl status graylog-server
Systemctl status elasticsearch
Systemctl status mongod

Did you create user with password for Graylog in your MongoDB?
Do you have Firewall or Selinux/Apparmor enabled?
Have you tried to upgrade MongoDb to version 4.2 or did you upgrade MongoDb to version 4.0?

I’ve currently rolled back to 3.3.12, and have it mostly online. 4.0.6 was hanging at the position I reflect in the logs I posted. It gives no error, but it does show one warning that it also recovers with the journal. mongodb, es, and graylog are all in an active running state, but it just hangs, and will stay in that state until stopped or restarted by myself. My versions are all compatible with 4.0.6 and I cannot find any issues with those services, they work when I roll them back to 3.3.12, of course I have to back rev my ES to 6.8, it is being run within AWS and so I don’t directly manage the ES service, the Graylog is on an EC2 instance. I’ve not been able to identify any additional information in the logs even with debug on to tell me anything more than what I have presented, I did a mongodump, and dropped the graylog database to try and start fresh, that didn’t seem to help though it did recreate the graylog db. I restored it when I rolled back to 3.3 which brought back my ldap auth, I didn’t expect it to work in gl4 as it’s an enterprise add, however I would expect it to throw an error in the logs somewhere if that were part of the issue, though blowing away the db should have resolved it.

No other changes in the system so selinux shouldn’t be an issue unless you have to make a modification I have not seen documented from version 3.3 to 4.

the second blockquote is the tail of the logs and I did go back through them several times to search for errors and warnings and have posted the only warning in the logs.

@Gabe

Hello,

Thanks you for more details on the issue, much apperciated.

Couple more question to ask you, since we established that all services are running and basically Graylog just hangs, have you check permissions on files and folders?
Did you upgrade java at any point during the upgrade to GL 4.0?
Did you create user with password for Graylog in your MongoDB?
Are you using HTTPS/TLS?
Do you have old plugins? if so, are they compatible with the newer version of Graylog?

By chance do you have this setting?

http_enable_cors = true

What I’m tring to find out is if there was a permission issue or a secured connection issue.
But since you have no errors/warnings in any log files. And the only logs shown is above I’m not sure. It might be a configuration error.

That is probalbay because of the breaking changes found here.

LDAP and Active Directory configuration changes

Sorry I cant be more help, maybe someone here had this problem before.