Starting Elasticsearch Service fail after upgrade

alessio.dapelo · April 15, 2022, 7:54am

Starting Elasticsearch Service fail after upgrade graylog from 4.2.7 to 4.2.8 version

This in the message that i see after reboot of the system:

apr 15 09:08:34 xxxxxxxx systemd[1]: Starting Elasticsearch...
apr 15 09:09:49 xxxxxxxx systemd[1]: elasticsearch.service: start operation timed out. Terminating.
apr 15 09:09:50 xxxxxxxx systemd[1]: elasticsearch.service: Failed with result 'timeout'.
apr 15 09:09:50 xxxxxxxx systemd[1]: Failed to start Elasticsearch.
xxxxxxxx@xxxxxxxx:~$ sudo systemctl status elasticsearch.service
● elasticsearch.service - Elasticsearch
     Loaded: loaded (/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
     Active: failed (Result: timeout) since Fri 2022-04-15 09:09:50 CEST; 18s ago
       Docs: https://www.elastic.co
    Process: 759 ExecStart=/usr/share/elasticsearch/bin/systemd-entrypoint -p ${PID_DIR}/elasticsearch.pid --quiet (code=exited, status=143)
   Main PID: 759 (code=exited, status=143)

Fortunately it happened on the test system and not on the main server
I did the same operation on two other graylog systems identical to the one that had a bad outcome and in the other cases the problem did not manifest itself
In the case failed by restarting the service manually everything goes back to working

What could it have depended on?
I don’t have much faith in updating on the main server before figuring out how to fix the problem highlighted should it re-emerge
I would not like to have to resort to manual restart of the service

Thanks for taking the time

gsmith · April 15, 2022, 9:48pm

Hello @alessio.dapelo

It might have been couple different reasons why it failed to start.
I need to ask a couple question.
Elasticsearch failed to start what did the log files show before and after the failed restart?

vi /var/log/elasticsearch/elqsticsearch.log

When this issue occurred by chance was the Journalctl checked? Journalctl is to view a Systemd Logs.

root# journalctl

Correct me if I’m wrong but after you restart elasticsearch it started working again?
If so, it might be something with the system and not with Graylog, not 100% sure. I would need to see more data. If this is correct it doesn’t seam to be a configuration issue.

A complete scan of any logs files that could possibly show an interruption of elasticsearch service.

Not sure how this upgrade && update was executed. More details would be needed.

Checklist:

1.Insure you have elasticsearch starting up on reboot.
systemctl enable elasticsearch

2.When upgrades are applied, it is suggested that Elasticsearch starts first, wait till the service is fully operational, then start MongoDb service. Once both of those service are running fine, next step is to start Graylog service. These steps have never failed me.

Hope that helps

EDIT: This also may help

How to prevent systemd service start operation from timing out | sleeplessbeastie's notes

alessio.dapelo · April 26, 2022, 11:34am

gsmith:

Hello @alessio.dapelo

It might have been couple different reasons why it failed to start.
I need to ask a couple question.
Elasticsearch failed to start what did the log files show before and after the failed restart?

vi /var/log/elasticsearch/elqsticsearch.log

When this issue occurred by chance was the Journalctl checked? Journalctl is to view a Systemd Logs.

root# journalctl

alessio.dapelo:

In the case failed by restarting the service manually everything goes back to working

Correct me if I’m wrong but after you restart elasticsearch it started working again?
If so, it might be something with the system and not with Graylog, not 100% sure. I would need to see more data. If this is correct it doesn’t seam to be a configuration issue.

alessio.dapelo:

What could it have depended on?

A complete scan of any logs files that could possibly show an interruption of elasticsearch service.

alessio.dapelo:

I would not like to have to resort to manual restart of the service

Not sure how this upgrade && update was executed. More details would be needed.

Checklist:

1.Insure you have elasticsearch starting up on reboot.
systemctl enable elasticsearch

2.When upgrades are applied, it is suggested that Elasticsearch starts first, wait till the service is fully operational, then start MongoDb service. Once both of those service are running fine, next step is to start Graylog service. These steps have never failed me.

Hope that helps

EDIT: This also may help

Thanks for your answer.
I’m trying to analyze the logs you indicated.
To answer some of your questions, I can tell you that by restarting the elasticsearch service everything will work again.
As for how the update was done, I can answer that having set the various repositories I simply ran the commands apt-get update upgrade and dist-upgrade from the terminal saying not to replace the server.conf file.
I performed the same procedure on two other identical virtual machines without running into this problem.

alessio.dapelo · April 26, 2022, 1:08pm

Could it be useful to increase the waiting time before reporting a timeout?
This is my current configuration

XXXX @ XXXX: ~ $ sudo systemctl show elasticsearch | grep ^ Timeout
[sudo] password for XXXX:
TimeoutStartUSec = 1min 15s
TimeoutStopUSec = infinity
TimeoutAbortUSec = infinity
TimeoutCleanUSec = infinity
XXXX @ XXXX: ~ $

gsmith · April 26, 2022, 10:58pm

Not sure that will help. We need to know exactly why that service failed. The reason there is a timeout is because systemd tries to restart or start the service and at the end of those tries systemd is unable to start that service, it will timeout.

So increasing the timeout may only delay the inevitable which is the error were seeing.

What is needed is more logs to identify the issue.

Executing systemctl status elasticsearch -l may show more details on this issue.
Also using Journalctl to show system logs are helpful in troubleshooting services.

Examples:

journalctl -ef
Jump to the end of the journal (-e, and enable follow mode (-f).

journalctl -u elasticsearch
This will display all messages generated by, and about, the elasticsearch.service.

gsmith · April 26, 2022, 11:13pm

I have done that also, but now when I upgrade/update Graylog I make it a point to shutdown Graylog service run updates then start graylog back up and tail -f logs files. Few time I caught problems/issue that manifest during the start-up process. If Graylog is the only service that requires update/s I do not stop the service, this would also depend on what version is going to be installed.

With Elasticsearch I have a tendency to check out the release notes on that version to insure a easy upgrade process. Sometime it may require a restart on ES service. This also depends what ES version is being updated to.

alessio.dapelo · April 27, 2022, 6:55am

I’ll do some more checks on the system logs
At the moment using the command

sudo systemctl edit --full elasticsearch.service

I increased the timeout time and restarting several times to test the virtual machine the elasticsearch service has always started regularly

system · May 11, 2022, 6:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch: Service not running Graylog Central (peer support)	2	32	August 12, 2024
Elasticsearch service failed(2.5->6.8 upgrade) Graylog Central (peer support)	2	593	December 12, 2019
Elasticsearch stops running on reboot Graylog Central (peer support)	16	2422	October 9, 2022
Graylog will not start Graylog Central (peer support)	5	1414	October 11, 2021
Inquiry for Elasticsearch connection issues Graylog Central (peer support)	2	547	May 6, 2022

Starting Elasticsearch Service fail after upgrade

Related topics