Fortunately it happened on the test system and not on the main server
I did the same operation on two other graylog systems identical to the one that had a bad outcome and in the other cases the problem did not manifest itself
In the case failed by restarting the service manually everything goes back to working
What could it have depended on?
I don’t have much faith in updating on the main server before figuring out how to fix the problem highlighted should it re-emerge
I would not like to have to resort to manual restart of the service
It might have been couple different reasons why it failed to start.
I need to ask a couple question.
Elasticsearch failed to start what did the log files show before and after the failed restart?
vi /var/log/elasticsearch/elqsticsearch.log
When this issue occurred by chance was the Journalctl checked? Journalctl is to view a Systemd Logs.
root# journalctl
Correct me if I’m wrong but after you restart elasticsearch it started working again?
If so, it might be something with the system and not with Graylog, not 100% sure. I would need to see more data. If this is correct it doesn’t seam to be a configuration issue.
A complete scan of any logs files that could possibly show an interruption of elasticsearch service.
Not sure how this upgrade && update was executed. More details would be needed.
Checklist:
1.Insure you have elasticsearch starting up on reboot. systemctl enable elasticsearch
2.When upgrades are applied, it is suggested that Elasticsearch starts first, wait till the service is fully operational, then start MongoDb service. Once both of those service are running fine, next step is to start Graylog service. These steps have never failed me.
Thanks for your answer.
I’m trying to analyze the logs you indicated.
To answer some of your questions, I can tell you that by restarting the elasticsearch service everything will work again.
As for how the update was done, I can answer that having set the various repositories I simply ran the commands apt-get update upgrade and dist-upgrade from the terminal saying not to replace the server.conf file.
I performed the same procedure on two other identical virtual machines without running into this problem.
Not sure that will help. We need to know exactly why that service failed. The reason there is a timeout is because systemd tries to restart or start the service and at the end of those tries systemd is unable to start that service, it will timeout.
So increasing the timeout may only delay the inevitable which is the error were seeing.
What is needed is more logs to identify the issue.
Executing systemctl status elasticsearch -l may show more details on this issue.
Also using Journalctl to show system logs are helpful in troubleshooting services.
Examples:
journalctl -ef
Jump to the end of the journal (-e, and enable follow mode (-f).
journalctl -u elasticsearch
This will display all messages generated by, and about, the elasticsearch.service.
I have done that also, but now when I upgrade/update Graylog I make it a point to shutdown Graylog service run updates then start graylog back up and tail -f logs files. Few time I caught problems/issue that manifest during the start-up process. If Graylog is the only service that requires update/s I do not stop the service, this would also depend on what version is going to be installed.
With Elasticsearch I have a tendency to check out the release notes on that version to insure a easy upgrade process. Sometime it may require a restart on ES service. This also depends what ES version is being updated to.