Error starting after upgrade from 3.3.9 to 4.01

I tried upgrading the following from 3.3.9 to 4.0.1 but I’m throwing errors in the Graylog startup logs.

All hosts Centos 7 current.
3 Graylog nodes
5 elasticsearch nodes version info:
“name” : “elastic1”,
“cluster_name” : “graylog”,
“cluster_uuid” : “L…jcQ”,
“version” : {
“number” : “6.8.13”,
“build_flavor” : “default”,
“build_type” : “rpm”,
“build_hash” : “be13c69”,
“build_date” : “2020-10-16T09:09:46.555371Z”,
“build_snapshot” : false,
“lucene_version” : “7.7.3”,
“minimum_wire_compatibility_version” : “5.6.0”,
“minimum_index_compatibility_version” : “5.0.0”

Mongodb on three graylog servers in cluster
mongod --version
db version v3.6.21
git version: 1cd2db51dce4b16f4bc97a75056269df0dc0bddb
OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
allocator: tcmalloc
modules: none
build environment:
distmod: rhel70
distarch: x86_64
target_arch: x86_64

Link to the error log.

Can’t figure out how to post formatted log as code here.

What you pasted before works, it had the errors. It looks like your graylog auth configuration in Mongo isn’t valid. Is mongod running? I can’t tell if Mongo isn’t running or if graylog-server is trying to run a post-upgrade migration task that’s failing and then graylog-server stops. What’s the output of the following?

sudo systemctl status mongod
sudo systemctl status graylog-server

The tail of the log is showing a duplicate key error. I think I’ve seen something similar here in another thread, trying to find it.

It looks like you’re using LDAP, is that right? You may know already but that became an enterprise-only feature in 4.x, and after the upgrade the auth configuration for LDAP is deactivated until you can review it because the way it works changes. So, it’s inconvenient but if Mongo is running you could manually remove the configuration from it to (hopefully) allow the post-upgrade migration task run by graylog-server to complete. Then, you’d need to rebuild it once everything’s running again.

@aaronsachs do you have any information on removing the graylog auth config from Mongo manually? I’m coming up empty on my search but I was sure I’d seen it somewhere.

Oh, you’ve said you rolled back. Is that everything? If so maybe just remove the non-local auth configs before you try again.

I am. I read about it being disabled until it was updated with more config settings. I rolled back.
I can try it again and remove the ldap auth before hand from the Graylog settings.

I rolled everything back except the elastic nodes.

Yeah, if you rolled back to a pre-upgrade state I think that’s a valid approach. It’s a pain especially if you have a lot of things configured but I would remove everything but the admin user to reduce the chance as much as possible. I think you should be fine even with ES already upgraded since the issue was in Mongo.

Thanks. I’ll give it a try now.

1 Like

If that works I’d be curious more about your pre-upgrade auth config. If you can share it in a sanitized manner so we can review maybe we can see what’s up. I wonder if the post-upgrade migration task was failing because you had 2 different auth configs with the same group name or something, so when it tried to turn those into teams the team name (key) was already being used.

Anything you have would be appreciated by the devs I’m sure.

yum remove graylog-enterprise-plugins graylog-enterprise-integrations-plugins

Seemed to have brought the system online. I never used enterprise as we do 40+ gb a day.
Shouldn’t have copied the command from the install page with those two.

Should have used this maybe?
sudo yum update && sudo yum install graylog-server graylog-integrations-plugins

Edit: I also removed the ldap/ad settings before upgrade.

Hm so you removed the graylog-enterprise-plugins which would have removed LDAP support. This was before you did the upgrade or after?

After I did the upgrade and it failed on first startup. I then removed the two enterprise plugins and started it.

Interesting. Had you removed all the LDAP related configs from Graylog before you did the upgrade? And it still failed to start until you removed enterprise plugins?

Correct. I went into the web interface and cleared and then disabled the AD integration settings before saving.

I also deleted all of the AD users.

Well that’s good information. So it seems like there’s still some kind of issue post-upgrade related to Mongo but after you disabled the plugins graylog-server is bypassing the logic that manifests the error and so it’s able to start successfully. Well, I’m glad you got it working, and since you seem to be unconcerned about the loss of the enterprise features I’d say you’re good to go. At >5GB a day if you ever decided to purchase enterprise you’d have enterprise support and they would be able to help you resolve the issue at that point I’m sure.

Glad it’s working now! Nice job.

1 Like

Thanks for the help. Since I don’t use Enterprise because of the volume am I missing out on anything else with the plugin missing?

Here’s the features comparison. You’ll need to review and assess it.

You weren’t able to use the enterprise features on the free license with a 40 GB/day volume anyways, so these features wouldn’t have been active even with a valid free enterprise (<=5GB/day) license.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.