Graylog Open 6.1.1 - Data Node won't start after disk full condition

jamesfreeman959 · October 23, 2024, 7:37pm

Hi all,

I’m running a simple Graylog setup on Docker using Docker Compose. The current version of Graylog is 6.1.1. This morning I experienced an out of disk space condition on the server hosting the docker containers. Although I have fixed this, I cannot get the data node to start - the MongoDB seems to have started fine, and the Graylog server seemed to be ok except it won’t start because it can’t connect to its only data node:

I found this error in the logs of the data node:

[TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];

The fixes for this seem to be to use curl to send a PUT request to the data node on port 9200. However it was when trying to do this that I realised that when the data node attempt to start, it is failing to start listening on ports 9200 and 9300 - it is only listening on port 8299. The log on the data node is very noisy, but I’ve filtered out all the INFO level events and there seem to be broadly 4-5 issues seen on the failed startup attempts:

[2024-10-23T11:31:45,409][WARN ][o.o.t.OutboundHandler    ] [datanode] send message failed [channel: Netty4TcpChannel{localAddress=/127.0.0.1:41588, remoteAddress=localhost/127.0.0.1:9300}]
javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
[2024-10-23T11:31:45,430][ERROR][o.o.s.s.t.SecuritySSLNettyTransport] [datanode] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
[2024-10-23T19:20:55,830][ERROR][o.o.i.i.ManagedIndexCoordinator] [datanode] get managed-index failed: NoShardAvailableActionException[No shard available for [org.opensearch.action.get.MultiGetShardRequest@22480500]]
[2024-10-23T19:21:03,263][ERROR][o.o.s.a.s.SinkProvider   ] [datanode] Default endpoint could not be created, auditlog will not work properly.
[2024-10-23T19:21:05,750][ERROR][o.o.i.i.ManagedIndexCoordinator] [datanode] Failed to get ISM policies with templates: Failed to execute phase [query], all shards failed

I am very much lost on how to fix this and any pointers would be appreciated - there seem to be a lot of SSL related errors in the logs, some of which relate to attempts to connect to port 9300. I’m conscious that 9200 is the API over HTTP, and 9200 over HTTPS, so I wonder if these are the reason that these are not starting. However I can’t figure out what has gone wrong - no config has changed, and all passwords provided to the containers via the docker compose environment are the same.

Any help or advise would be gratefully received!

Thanks

system · November 6, 2024, 7:38pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
OpenSearch failed to start after reaching high watermark Graylog Central (peer support) docker	2	431	May 25, 2024
Datanode refuse to start after upgrade Graylog Central (peer support)	6	263	July 30, 2024
Request Assistance – Graylog Data Node Not Connecting Graylog Central (peer support)	2	227	March 22, 2025
Disk space issue, can't start Graylog Central (peer support) data-node	5	217	December 26, 2024
Graylog in Docker - Certificate Broken Graylog Central (peer support) docker	2	1380	December 6, 2023

Graylog Open 6.1.1 - Data Node won't start after disk full condition

Related topics