GrayLog server stopped

Hello, have new GL server appliance, and it’s randomly stop. Please help!

Graylog logs: https://pastebin.com/PztEMnwj

Elastic logs: https://pastebin.com/VY6zGxWk

Hey @tivrobo,

Well, your Elasticsearch logs do not give any information at all.
Are there any further information that you can gather? (Tip: https://www.gilesorr.com/blog/elasticsearch-startup.html)

Your Graylog logs shows two main problems:

2018-09-04_10:23:53.20345 WARN  [LdapConnector] Unable to iterate over user's groups, unable to perform group mapping. Graylog does not support LDAP referrals at the moment. Please see http://docs.graylog.org/en/2.4/pages/users_and_roles/external_auth.html#troubleshooting for more information.

This issue can be solved, but it is not an issue that would stop Graylog afaik. See this info from the Graylog Docs:

Graylog Docs -> Users and Roles -> External authentication -> LDAP/AD -> Troubleshooting

These issues may be resolved by either managing the groups manually, or configuring the LDAP connection to work against the global catalog. The first solution means simply that the LDAP group settings must not be set, and the groups are managed locally. The global catalog solution requires using the 3268/TCP, or 3269/TCP (TLS) port of eligible Active Directory server. The downside is that using the global catalog service consumes slightly more server resources.

The second issue:

2018-09-04_10:25:09.40178 ERROR [NodeChecker] Error executing NodesInfo!

Your Graylog seems to loose the connection to Elasticsearch or can’t establish one in the first place. These are further messages stating that:

2018-09-04_10:25:10.53588 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #1).
2018-09-04_10:25:10.54723 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #2).
2018-09-04_10:25:10.55931 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #3).
2018-09-04_10:25:10.57526 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #4).
2018-09-04_10:25:10.59950 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #5).
2018-09-04_10:25:10.63982 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #6).
2018-09-04_10:25:10.71182 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #7).
2018-09-04_10:25:10.84849 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #8).
2018-09-04_10:25:11.11366 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #9).
2018-09-04_10:25:11.63486 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #10).
2018-09-04_10:25:12.66794 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #11).
2018-09-04_10:25:14.72477 ERROR [Messages] Caught exception during bulk indexing: io.searchbox.client.config.exception.CouldNotConnectException: Could not connect to http://10.128.1.25:9200, retrying (attempt #12).

I think it is obvious that your problem is within Elasticsearch, since Graylog is not throwing any other error and is able to perform a graceful shutdown. To troubleshoot that, you’ll need to provide a little bit more info.

Also information about your server platform, your Graylog and Elasticsearch Version etc. would be helpful.

Greetings,
Philipp

1 Like

Hi @derPhlipsi!

Thank for replying here!

So I need to stop Elastic service and start it with -d param and wait in console for crash, right?

This is VMWare VM from Latest GrayLog Appliance: graylog-2.4.6-1.ova with 8 gigs RAM, and 4 vCPU

Elasticsearch:

number : 5.6.3
build_hash : 1a2f265
build_date : 2017-10-06T20:33:39.012Z
build_snapshot : false
lucene_version : 6.6.1

Graylog

Hostname: a-graylog-inf.mydomain
Node ID: e12f9e7b-ee3b-4757-ad32-80d779987309
Version: 2.4.6+ceaa7e4, codename Wildwuchs
JVM: PID 1264, Oracle Corporation 1.8.0_172 on Linux 4.4.0-134-generic
Time: 2018-09-04 12:44:04 -04:00

Hey @tivrobo,

Well, maybe. Depends if the pastebin you provided earlier contains the full elasticsearch log. If elasticsearch runs fine for a while and then crashes, the normal log file should be fine, but if the output above is all it provides inside the log file, using the debug flag (-d) is probably the best way of finding issues.

In a quick google search for “elasticsearch 5.6.3 crashes” I found this:

Maybe doing this could also give you more information:

Checking /tmp/hs_err_*.log files reveals JVM errors with SIGSEGV happening about 2 hours before Elasticsearch service crashes:

Hope it helps :slight_smile:

Greetings,
Philipp

1 Like

@derPhlipsi

I turn ON debug mode by this command:
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'{"transient":{"logger._root":"DEBUG"}}'
Now I wait for crash.

1 Like

Okay, it happens today :slight_smile:

https://pastebin.com/tHwqfpME

this just shows the shutdown and no reason for the shutdown …

2 Likes

OK, I’ve decided to redeploy my installation…
If this situation will persists I will try to go deeper to troubleshoot this.
Thanks all for help!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.