Problems after reverting an upgrade : MongoDB flapping


#1

Attempted an upgrade on an Amazon AMI Graylog 2.0.2 cluster (3 web servers, 4 elastic nodes), encountered problems after installing 2.3.2 on the web servers, decided to revert.

We upgraded by doing the following:

wget https://packages.graylog2.org/releases/graylog-omnibus/ubuntu/graylog_2.3.2-3_amd64.deb
sudo graylog-ctl stop
sudo dpkg -G -i graylog_2.3.2-3_amd64.deb
sudo graylog-ctl backup-etcd
sudo graylog-ctl reconfigure
sudo reboot

After encountering problems with etcd refusing to start on any of the three web cluster members, we decided to revert back to 2.0.2.

We preformed the revert by doing the following on each server:

wget https://packages.graylog2.org/releases/graylog-omnibus/ubuntu/graylog_2.0.2-1_amd64.deb
sudo graylog-ctl stop ####if there were issues, kill runsvdir parent process for graylog, then try again
sudo dpkg -i graylog_2.0.2-1_amd64.
sudo su
rm -r /var/opt/graylog/data/etcd/*
exit
sudo graylog-ctl reconfigure

After resolving the initial startups and making sure the master web server was good to go, did the same work on the supporting web server members.

This time, mongodb would not start and stay up, so instead it flaps.

The log output (/var/log/graylog/mongodb/current) being looped goes like this:

2018-01-02_17:44:29.24670 2018-01-02T10:44:29.245-0700 I CONTROL  [initandlisten] MongoDB starting : pid=15210 port=
27017 dbpath=/var/opt/graylog/data/mongodb 64-bit host=web2
2018-01-02_17:44:29.24672 2018-01-02T10:44:29.245-0700 I CONTROL  [initandlisten] db version v3.2.5
2018-01-02_17:44:29.24840 2018-01-02T10:44:29.245-0700 I CONTROL  [initandlisten] git version: 34e65e5383f7ea1726332
cb175b73077ec4a1b02
2018-01-02_17:44:29.24841 2018-01-02T10:44:29.245-0700 I CONTROL  [initandlisten] allocator: tcmalloc
2018-01-02_17:44:29.24945 2018-01-02T10:44:29.245-0700 I CONTROL  [initandlisten] modules: none
2018-01-02_17:44:29.24997 2018-01-02T10:44:29.245-0700 I CONTROL  [initandlisten] build environment:
2018-01-02_17:44:29.25097 2018-01-02T10:44:29.245-0700 I CONTROL  [initandlisten]     distarch: x86_64
2018-01-02_17:44:29.25119 2018-01-02T10:44:29.245-0700 I CONTROL  [initandlisten]     target_arch: x86_64
2018-01-02_17:44:29.25210 2018-01-02T10:44:29.245-0700 I CONTROL  [initandlisten] options: { storage: { dbPath: "/va
r/opt/graylog/data/mongodb", mmapv1: { smallFiles: true } } }
2018-01-02_17:44:29.26510 2018-01-02T10:44:29.265-0700 I -        [initandlisten] Detected data files in /var/opt/gr
aylog/data/mongodb created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2018-01-02_17:44:29.26564 2018-01-02T10:44:29.265-0700 I STORAGE  [initandlisten] wiredtiger_open config: create,cac
he_size=3G,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=
true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statist
ics_log=(wait=0),
2018-01-02_17:44:29.44082 2018-01-02T10:44:29.440-0700 I -        [initandlisten] Assertion: 13111:wrong type for fi
eld (ns) 10 != 2
2018-01-02_17:44:29.44777 2018-01-02T10:44:29.447-0700 I CONTROL  [initandlisten] 
2018-01-02_17:44:29.44824  0x12fa0b2 0x12a5418 0x1290cb8 0x1290d6c 0xfc43ce 0xfcecfa 0x1067038 0xf8d9c8 0x95d2b6 0x9
626bd 0x7f0a5ba5bec5 0x959c59
2018-01-02_17:44:29.44898 ----- BEGIN BACKTRACE -----
2018-01-02_17:44:29.45057 {"backtrace":[{"b":"400000","o":"EFA0B2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"40000
0","o":"EA5418","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"E90CB8","s":"_ZN5mongo11msgassertedEiPKc"},{"b":
"400000","o":"E90D6C"},{"b":"400000","o":"BC43CE","s":"_ZN5mongo9KVCatalog4initEPNS_16OperationContextE"},{"b":"4000
00","o":"BCECFA","s":"_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE"},{"b":"400000","o":
"C67038"},{"b":"400000","o":"B8D9C8","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"4
00000","o":"55D2B6","s":"_ZN5mongo13initAndListenEi"},{"b":"400000","o":"5626BD","s":"main"},{"b":"7F0A5BA3A000","o"
:"21EC5","s":"__libc_start_main"},{"b":"400000","o":"559C59"}],"processInfo":{ "mongodbVersion" : "3.2.5", "gitVersi
on" : "34e65e5383f7ea1726332cb175b73077ec4a1b02", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release"
 : "3.13.0-86-generic", "version" : "#131-Ubuntu SMP Thu May 12 23:33:13 UTC 2016", "machine" : "x86_64" }, "somap" 
: [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFECAA9A000", "elfType" : 3 }, { "b" : "7F0A5C73D000", "path" : "/l
ib/x86_64-linux-gnu/librt.so.1", "elfType" : 3 }, { "b" : "7F0A5C539000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2
", "elfType" : 3 }, { "b" : "7F0A5C233000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3 }, { "b" : "7F
0A5C01D000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F0A5BDFF000", "path" : "/lib
/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3 }, { "b" : "7F0A5BA3A000", "path" : "/lib/x86_64-linux-gnu/libc.so
.6", "elfType" : 3 }, { "b" : "7F0A5C945000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
2018-01-02_17:44:29.45120  mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x12fa0b2]
2018-01-02_17:44:29.45207  mongod(_ZN5mongo10logContextEPKc+0x138) [0x12a5418]
2018-01-02_17:44:29.45279  mongod(_ZN5mongo11msgassertedEiPKc+0x88) [0x1290cb8]
2018-01-02_17:44:29.45419  mongod(+0xE90D6C) [0x1290d6c]
2018-01-02_17:44:29.45487  mongod(_ZN5mongo9KVCatalog4initEPNS_16OperationContextE+0x31E) [0xfc43ce]
2018-01-02_17:44:29.45634  mongod(_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE+0x51A) [
0xfcecfa]
2018-01-02_17:44:29.45669  mongod(+0xC67038) [0x1067038]
2018-01-02_17:44:29.45767  mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x598) [0xf8d9c8]
2018-01-02_17:44:29.45812  mongod(_ZN5mongo13initAndListenEi+0x376) [0x95d2b6]
2018-01-02_17:44:29.45917  mongod(main+0x15D) [0x9626bd]
2018-01-02_17:44:29.45943  libc.so.6(__libc_start_main+0xF5) [0x7f0a5ba5bec5]
2018-01-02_17:44:29.46053  mongod(+0x559C59) [0x959c59]
2018-01-02_17:44:29.46123 -----  END BACKTRACE  -----

#2

At this point, the strangest thing is that the master server is running mongodb just fine. It went through the same upgrade and revert steps, but it’s running (fortunately).

I don’t know enough about mongodb to know what I need to do to it, but I wonder if it’s due to the changes to MongoDB from the upgrade?


#3

Yet, that doesn’t make sense - why did the master web server come right up after the same revert?


#4

Also, realized I completely forgot about the comment made here : Upgrading within Amazon

Downgrade is not supported, so make sure to make a backup or snapshot before upgrading.

Well, I see my first mistake.


#5

I found some info on this error here : https://docs.mongodb.com/manual/reference/operator/query/type/#op._S_type

Still trying to understand what to do.


#6

and it appears that the only BSON I can find (/var/opt/graylog/data/mongodb) that might produce the error is always basic.

{ 
	"storage" : 
	{ 
		"engine" : "wiredTiger", 
		"options" : 
			{ 
				"directoryPerDB" : false, 
				"directoryForIndexes" : false 
			} 
	} 
}

1 objects found

(Jan Doberstein) #7

We have covered in the Documentation that you should not jump over multiple version numbers - just do one version after another.

http://docs.graylog.org/en/2.4/pages/upgrade.html


#8

Given that addition was added as of 2.3, I’m not surprised I completely missed that. I’ve been running varying versions of Graylog since 2.0, so I failed to see that.

So if I do the upgrade from 2.0.2 to 2.1 to 2.2 to 2.3, that should go more smoothly.

Thanks for that!


#9

Now, in regards to the mongodb problem - given the error, would changing the version of mongo available in mongo’s bin folder potentially help it?

I saw at least one troubleshooting thread recommend changing from 3.2.4 to 3.0 in an attempt to resolve their issues.

Since I upgraded via graylog-ctl and then “reverted” via an install of 2.0.2 over the top of the 2.3.2 upgrade, could it have left the old mongo version?


(system) #10

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.