GrayLog server stopped

tivrobo · September 4, 2018, 2:54pm

Hello, have new GL server appliance, and it’s randomly stop. Please help!

Graylog logs: https://pastebin.com/PztEMnwj

Elastic logs: https://pastebin.com/VY6zGxWk

derPhlipsi · September 4, 2018, 3:32pm

Well, your Elasticsearch logs do not give any information at all.
Are there any further information that you can gather? (Tip: https://www.gilesorr.com/blog/elasticsearch-startup.html)

Your Graylog logs shows two main problems:

2018-09-04_10:23:53.20345 WARN  [LdapConnector] Unable to iterate over user's groups, unable to perform group mapping. Graylog does not support LDAP referrals at the moment. Please see http://docs.graylog.org/en/2.4/pages/users_and_roles/external_auth.html#troubleshooting for more information.

This issue can be solved, but it is not an issue that would stop Graylog afaik. See this info from the Graylog Docs:

Graylog Docs -> Users and Roles -> External authentication -> LDAP/AD -> Troubleshooting

These issues may be resolved by either managing the groups manually, or configuring the LDAP connection to work against the global catalog. The first solution means simply that the LDAP group settings must not be set, and the groups are managed locally. The global catalog solution requires using the 3268/TCP, or 3269/TCP (TLS) port of eligible Active Directory server. The downside is that using the global catalog service consumes slightly more server resources.

The second issue:

2018-09-04_10:25:09.40178 ERROR [NodeChecker] Error executing NodesInfo!

Your Graylog seems to loose the connection to Elasticsearch or can’t establish one in the first place. These are further messages stating that:

2018-09-04_10:25:10.53588 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #1).
2018-09-04_10:25:10.54723 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #2).
2018-09-04_10:25:10.55931 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #3).
2018-09-04_10:25:10.57526 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #4).
2018-09-04_10:25:10.59950 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #5).
2018-09-04_10:25:10.63982 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #6).
2018-09-04_10:25:10.71182 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #7).
2018-09-04_10:25:10.84849 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #8).
2018-09-04_10:25:11.11366 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #9).
2018-09-04_10:25:11.63486 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #10).
2018-09-04_10:25:12.66794 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #11).
2018-09-04_10:25:14.72477 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #12).

I think it is obvious that your problem is within Elasticsearch, since Graylog is not throwing any other error and is able to perform a graceful shutdown. To troubleshoot that, you’ll need to provide a little bit more info.

Also information about your server platform, your Graylog and Elasticsearch Version etc. would be helpful.

Greetings,
Philipp

tivrobo · September 4, 2018, 4:45pm

Hi @derPhlipsi!

Thank for replying here!

So I need to stop Elastic service and start it with -d param and wait in console for crash, right?

This is VMWare VM from Latest GrayLog Appliance: graylog-2.4.6-1.ova with 8 gigs RAM, and 4 vCPU

Elasticsearch:

number : 5.6.3
build_hash : 1a2f265
build_date : 2017-10-06T20:33:39.012Z
build_snapshot : false
lucene_version : 6.6.1

Graylog

Hostname: a-graylog-inf.mydomain
Node ID: e12f9e7b-ee3b-4757-ad32-80d779987309
Version: 2.4.6+ceaa7e4, codename Wildwuchs
JVM: PID 1264, Oracle Corporation 1.8.0_172 on Linux 4.4.0-134-generic
Time: 2018-09-04 12:44:04 -04:00

derPhlipsi · September 4, 2018, 7:12pm

Hey @tivrobo,

Well, maybe. Depends if the pastebin you provided earlier contains the full elasticsearch log. If elasticsearch runs fine for a while and then crashes, the normal log file should be fine, but if the output above is all it provides inside the log file, using the debug flag (-d) is probably the best way of finding issues.

In a quick google search for “elasticsearch 5.6.3 crashes” I found this:

Maybe doing this could also give you more information:

Checking /tmp/hs_err_*.log files reveals JVM errors with SIGSEGV happening about 2 hours before Elasticsearch service crashes:

Hope it helps

Greetings,
Philipp

tivrobo · September 5, 2018, 11:37am

@derPhlipsi

I turn ON debug mode by this command:
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'{"transient":{"logger._root":"DEBUG"}}'
Now I wait for crash.

tivrobo · September 6, 2018, 10:46am

Okay, it happens today

https://pastebin.com/tHwqfpME

jan · September 6, 2018, 11:25am

this just shows the shutdown and no reason for the shutdown …

tivrobo · September 12, 2018, 2:01pm

OK, I’ve decided to redeploy my installation…
If this situation will persists I will try to go deeper to troubleshoot this.
Thanks all for help!

system · September 26, 2018, 2:01pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexing of new messages stoppes occasionally Graylog Central (peer support)	17	1542	May 17, 2017
Graylog stops processing logs at the same time every day Graylog Central (peer support)	5	1252	March 15, 2019
Connection to Elastic Stops Graylog Central (peer support)	7	724	February 22, 2018
Graylog stop process message, elasticsearch status ok Graylog Central (peer support)	16	867	December 29, 2020
Graylog-ES Communications Graylog Central (peer support)	12	1668	June 29, 2017

GrayLog server stopped

The second issue:

Elasticsearch:

Graylog

Related topics