Hi all,
I’m running a simple Graylog setup on Docker using Docker Compose. The current version of Graylog is 6.1.1. This morning I experienced an out of disk space condition on the server hosting the docker containers. Although I have fixed this, I cannot get the data node to start - the MongoDB seems to have started fine, and the Graylog server seemed to be ok except it won’t start because it can’t connect to its only data node:
I found this error in the logs of the data node:
[TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];
The fixes for this seem to be to use curl
to send a PUT request to the data node on port 9200. However it was when trying to do this that I realised that when the data node attempt to start, it is failing to start listening on ports 9200 and 9300 - it is only listening on port 8299. The log on the data node is very noisy, but I’ve filtered out all the INFO level events and there seem to be broadly 4-5 issues seen on the failed startup attempts:
[2024-10-23T11:31:45,409][WARN ][o.o.t.OutboundHandler ] [datanode] send message failed [channel: Netty4TcpChannel{localAddress=/127.0.0.1:41588, remoteAddress=localhost/127.0.0.1:9300}]
javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
[2024-10-23T11:31:45,430][ERROR][o.o.s.s.t.SecuritySSLNettyTransport] [datanode] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
[2024-10-23T19:20:55,830][ERROR][o.o.i.i.ManagedIndexCoordinator] [datanode] get managed-index failed: NoShardAvailableActionException[No shard available for [org.opensearch.action.get.MultiGetShardRequest@22480500]]
[2024-10-23T19:21:03,263][ERROR][o.o.s.a.s.SinkProvider ] [datanode] Default endpoint could not be created, auditlog will not work properly.
[2024-10-23T19:21:05,750][ERROR][o.o.i.i.ManagedIndexCoordinator] [datanode] Failed to get ISM policies with templates: Failed to execute phase [query], all shards failed
I am very much lost on how to fix this and any pointers would be appreciated - there seem to be a lot of SSL related errors in the logs, some of which relate to attempts to connect to port 9300. I’m conscious that 9200 is the API over HTTP, and 9200 over HTTPS, so I wonder if these are the reason that these are not starting. However I can’t figure out what has gone wrong - no config has changed, and all passwords provided to the containers via the docker compose environment are the same.
Any help or advise would be gratefully received!
Thanks