Building out a Graylog cluster in AWS with 3 web servers and 3 datanodes. Using the Graylog 2.2.2 AMI , and the 3 webservers will work for a time before reporting “Journal metrics unavailable.” They come in and out of reporting, then at some point within 2 days, one of the servers will crap out and I have to run “graylog-ctl reconfigure-as-server” to start back up.
Any of the hosts showing “Journal metrics unavailable” also tend to not receive the logs sent to them. Still not sure where to find the right logs to troubleshoot, as the main server logging (“graylog-server”) doesn’t show anything telling.
Security groups have been checked and rechecked, but it seems to be something else, given for how long it works before this behavior starts.