Hi @Tdvorak thanks for geting back to me. sorry for taking so long to reply - i’ve been on holiday !
I’ll see If i can get a 3rd node into the cluster, we have a dev front/backend setup on ‘free’ redhat dev licences which i could potentially break up and move one of the boxes into the live cluster.
I’ve been through the datanode logs and there are only several flavours of “authenticaion failed” in both the opensearch and datanode logs:
2025-04-22T08:28:56.431Z INFO [OpensearchProcessImpl] [2025-04-22T08:28:56,431][WARN ][o.o.s.a.BackendRegistry ] [RNGDC1PGLS01.rng.reddenorthgate.com] Authentication finally failed for null from 10.181.144.15:43942
2025-04-22T08:28:57.074Z INFO [OpensearchProcessImpl] [2025-04-22T08:28:57,074][WARN ][o.o.s.a.BackendRegistry ] [RNGDC1PGLS01.rng.reddenorthgate.com] Authentication finally failed for null from 10.181.144.15:43942
2025-04-22T08:28:57.389Z INFO [OpensearchProcessImpl] [2025-04-22T08:28:57,389][WARN ][o.o.s.a.BackendRegistry ] [RNGDC1PGLS01.rng.reddenorthgate.com] Authentication finally failed for null from 10.181.144.15:43942
and:
type": “server”, “timestamp”: “2025-04-22T08:25:52,590Z”, “level”: “WARN”, “component”: “o.o.s.a.BackendRegistry”, “cluster.name”: “datanode-cluster”, “node.name”: “RNGDC1PGLS01.rng.reddenorthgate.com”, “message”: “Authentication finally failed for null from 10.181.144.15:60236”, “cluster.uuid”: “zET-JFeuSkOn3YMjqGYrvw”, “node.id”: “KEYKdfuVTpytmfAAmlA1ZQ” }
{“type”: “server”, “timestamp”: “2025-04-22T08:25:52,770Z”, “level”: “WARN”, “component”: “o.o.s.a.BackendRegistry”, “cluster.name”: “datanode-cluster”, “node.name”: “RNGDC1PGLS01.rng.reddenorthgate.com”, “message”: “Authentication finally failed for null from 10.181.144.15:60236”, “cluster.uuid”: “zET-JFeuSkOn3YMjqGYrvw”, “node.id”: “KEYKdfuVTpytmfAAmlA1ZQ” }
The server.log is much more descriptive and has plenty of errors and stacktraces inside it. i’ve snipped a bit out here, and will work on uploading the full log once i can move it off the lan.
2025-04-22T09:19:04.968+01:00 WARN [Messages] Caught exception during bulk indexing: ElasticsearchException{message=OpenSearchException[An error occurred: ]; nested: OpenSearchStatusException[Unable to parse response body]; nested: ResponseException[method [POST], host [https://RNGDC1PGLS02.rng.reddenorthgate.com:9200], URI [/_bulk?timeout=1m], status line [HTTP/1.1 401 Unauthorized]
Authentication finally failed];, errorDetails=[]}, retrying (attempt #8).
2025-04-22T09:19:05.184+01:00 ERROR [ClusterAdapterOS2] Check for connectivity failed with exception 'An error occurred: ' - enable debug level for this class to see the stack trace.
2025-04-22T09:19:05.306+01:00 WARN [Messages] Caught exception during bulk indexing: ElasticsearchException{message=OpenSearchException[An error occurred: ]; nested: OpenSearchStatusException[Unable to parse response body]; nested: ResponseException[method [POST], host [https://RNGDC1PGLS02.rng.reddenorthgate.com:9200], URI [/_bulk?timeout=1m], status line [HTTP/1.1 401 Unauthorized]
Authentication finally failed];, errorDetails=[]}, retrying (attempt #9).
2025-04-22T09:19:05.903+01:00 WARN [Messages] Caught exception during bulk indexing: ElasticsearchException{message=OpenSearchException[An error occurred: ]; nested: OpenSearchStatusException[Unable to parse response body]; nested: ResponseException[method [POST], host [https://RNGDC1PGLS01.rng.reddenorthgate.com:9200], URI [/_bulk?timeout=1m], status line [HTTP/1.1 401 Unauthorized]
Authentication finally failed];, errorDetails=[]}, retrying (attempt #10).
2025-04-22T09:19:06.080+01:00 WARN [Messages] Caught exception during bulk indexing: ElasticsearchException{message=OpenSearchException[An error occurred: ]; nested: OpenSearchStatusException[Unable to parse response body]; nested: ResponseException[method [POST], host [https://RNGDC1PGLS02.rng.reddenorthgate.com:9200], URI [/_bulk?timeout=1m], status line [HTTP/1.1 401 Unauthorized]
Authentication finally failed];, errorDetails=[]}, retrying (attempt #12).
2025-04-22T09:19:06.184+01:00 ERROR [ClusterAdapterOS2] Check for connectivity failed with exception 'An error occurred: ' - enable debug level for this class to see the stack trace.
2025-04-22T09:19:07.042+01:00 WARN [Messages] Caught exception during bulk indexing: ElasticsearchException{message=OpenSearchException[An error occurred: ]; nested: OpenSearchStatusException[Unable to parse response body]; nested: ResponseException[method [POST], host [https://RNGDC1PGLS02.rng.reddenorthgate.com:9200], URI [/_bulk?timeout=1m], status line [HTTP/1.1 401 Unauthorized]
Authentication finally failed];, errorDetails=[]}, retrying (attempt #11).
2025-04-22T09:19:07.180+01:00 ERROR [ClusterAdapterOS2] Check for connectivity failed with exception 'An error occurred: ' - enable debug level for this class to see the stack trace.
2025-04-22T09:19:08.162+01:00 ERROR [ClusterAdapterOS2] Check for connectivity failed with exception 'An error occurred: ' - enable debug level for this class to see the stack trace.
2025-04-22T09:19:08.162+01:00 WARN [IndexRotationThread] Elasticsearch cluster isn't healthy. Skipping index rotation.
2025-04-22T09:19:08.180+01:00 ERROR [ClusterAdapterOS2] Check for connectivity failed with exception 'An error occurred: ' - enable debug level for this class to see the stack trace.
2025-04-22T09:19:08.184+01:00 ERROR [VersionProbe] Unable to retrieve version from indexer node RNGDC1PGLS02.rng.reddenorthgate.com:9200: unknown error - an exception occurred while deserializing error response: {}
com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'Authentication': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
at [Source: (okio.Buffer$inputStream$1); line: 1, column: 16]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2572) ~[graylog.jar:?]
at com.fasterxml.jackson.core.JsonParser._constructReadException(JsonParser.java:2598) ~[graylog.jar:?]
at com.fasterxml.jackson.core.JsonParser._constructReadException(JsonParser.java:2606) ~[graylog.jar:?]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:765) ~[graylog.jar:?]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3659) ~[graylog.jar:?]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2747) ~[graylog.jar:?]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:867) ~[graylog.jar:?]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:753) ~[graylog.jar:?]
at com.fasterxml.jackson.databind.ObjectReader._initForReading(ObjectReader.java:357) ~[graylog.jar:?]
at com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:2115) ~[graylog.jar:?]
at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1501) ~[graylog.jar:?]
at retrofit2.converter.jackson.JacksonResponseBodyConverter.convert(JacksonResponseBodyConverter.java:33) ~[graylog.jar:?]
at retrofit2.converter.jackson.JacksonResponseBodyConverter.convert(JacksonResponseBodyConverter.java:23) ~[graylog.jar:?]
at org.graylog2.storage.versionprobe.VersionProbe.lambda$probeSingleHost$3(VersionProbe.java:149) ~[graylog.jar:?]
at org.graylog2.storage.versionprobe.VersionProbe.rootResponse(VersionProbe.java:208) ~[graylog.jar:?]
at org.graylog2.storage.versionprobe.VersionProbe.probeSingleHost(VersionProbe.java:159) ~[graylog.jar:?]
at org.graylog2.storage.versionprobe.VersionProbe.lambda$probeAllHosts$2(VersionProbe.java:125) ~[graylog.jar:?]
at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source) ~[?:?]
at java.base/java.util.ArrayList$ArrayListSpliterator.tryAdvance(Unknown Source) ~[?:?]
at java.base/java.util.stream.ReferencePipeline.forEachWithCancel(Unknown Source) ~[?:?]
at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(Unknown Source) ~[?:?]
at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source) ~[?:?]
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) ~[?:?]
at java.base/java.util.stream.FindOps$FindOp.evaluateSequential(Unknown Source) ~[?:?]
at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source) ~[?:?]
at java.base/java.util.stream.ReferencePipeline.findFirst(Unknown Source) ~[?:?]
at org.graylog2.storage.versionprobe.VersionProbe.probeAllHosts(VersionProbe.java:127) ~[graylog.jar:?]
at org.graylog2.storage.versionprobe.VersionProbe.lambda$probe$1(VersionProbe.java:107) ~[graylog.jar:?]
at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) [graylog.jar:?]
at com.github.rholder.retry.Retryer.call(Retryer.java:160) [graylog.jar:?]
at org.graylog2.storage.versionprobe.VersionProbe.probe(VersionProbe.java:107) [graylog.jar:?]
at org.graylog2.storage.versionprobe.VersionProbe.probe(VersionProbe.java:84) [graylog.jar:?]
at org.graylog2.periodical.ESVersionCheckPeriodical.doRun(ESVersionCheckPeriodical.java:104) [graylog.jar:?]
at org.graylog2.plugin.periodical.Periodical.run(Periodical.java:99) [graylog.jar:?]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source) [?:?]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
at java.base/java.lang.Thread.run(Unknown Source) [?:?]
2025-04-22T09:19:08.185+01:00 ERROR [VersionProbe] Unable to retrieve version from indexer node RNGDC1PGLS01.rng.reddenorthgate.com:9200: unknown error - an exception occurred while deserializing error response: {}
com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'Authentication': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
at [Source: (okio.Buffer$inputStream$1); line: 1, column: 16]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2572) ~[graylog.jar:?]
at com.fasterxml.jackson.core.JsonParser._constructReadException(JsonParser.java:2598) ~[graylog.jar:?]
at com.fasterxml.jackson.core.JsonParser._constructReadException(JsonParser.java:2606) ~[graylog.jar:?]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:765) ~[graylog.jar:?]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3659) ~[graylog.jar:?]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2747) ~[graylog.jar:?]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:867) ~[graylog.jar:?]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:753) ~[graylog.jar:?]
at com.fasterxml.jackson.databind.ObjectReader._initForReading(ObjectReader.java:357) ~[graylog.jar:?]
at com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:2115) ~[graylog.jar:?]
at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1501) ~[graylog.jar:?]
at retrofit2.converter.jackson.JacksonResponseBodyConverter.convert(JacksonResponseBodyConverter.java:33) ~[graylog.jar:?]
at retrofit2.converter.jackson.JacksonResponseBodyConverter.convert(JacksonResponseBodyConverter.java:23) ~[graylog.jar:?]
at org.graylog2.storage.versionprobe.VersionProbe.lambda$probeSingleHost$3(VersionProbe.java:149) ~[graylog.jar:?]
at org.graylog2.storage.versionprobe.VersionProbe.rootResponse(VersionProbe.java:208) ~[graylog.jar:?]
at org.graylog2.storage.versionprobe.VersionProbe.probeSingleHost(VersionProbe.java:159) ~[graylog.jar:?]
at org.graylog2.storage.versionprobe.VersionProbe.lambda$probeAllHosts$2(VersionProbe.java:125) ~[graylog.jar:?]
at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source) ~[?:?]
at java.base/java.util.ArrayList$ArrayListSpliterator.tryAdvance(Unknown Source) ~[?:?]
at java.base/java.util.stream.ReferencePipeline.forEachWithCancel(Unknown Source) ~[?:?]
at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(Unknown Source) ~[?:?]
at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source) ~[?:?]
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) ~[?:?]
at java.base/java.util.stream.FindOps$FindOp.evaluateSequential(Unknown Source) ~[?:?]
at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source) ~[?:?]
at java.base/java.util.stream.ReferencePipeline.findFirst(Unknown Source) ~[?:?]
at org.graylog2.storage.versionprobe.VersionProbe.probeAllHosts(VersionProbe.java:127) ~[graylog.jar:?]
at org.graylog2.storage.versionprobe.VersionProbe.lambda$probe$1(VersionProbe.java:107) ~[graylog.jar:?]
at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) [graylog.jar:?]
at com.github.rholder.retry.Retryer.call(Retryer.java:160) [graylog.jar:?]
at org.graylog2.storage.versionprobe.VersionProbe.probe(VersionProbe.java:107) [graylog.jar:?]
at org.graylog2.storage.versionprobe.VersionProbe.probe(VersionProbe.java:84) [graylog.jar:?]
at org.graylog2.periodical.ESVersionCheckPeriodical.doRun(ESVersionCheckPeriodical.java:104) [graylog.jar:?]
at org.graylog2.plugin.periodical.Periodical.run(Periodical.java:99) [graylog.jar:?]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source) [?:?]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
at java.base/java.lang.Thread.run(Unknown Source) [?:?]
2025-04-22T09:19:08.186+01:00 INFO [VersionProbe] Indexer is not available. Retry #3
2025-04-22T09:19:09.183+01:00 WARN [Messages] Caught exception during bulk indexing: ElasticsearchException{message=OpenSearchException[An error occurred: ]; nested: OpenSearchStatusException[Unable to parse response body]; nested: ResponseException[method [POST], host [https://RNGDC1PGLS01.rng.reddenorthgate.com:9200], URI [/_bulk?timeout=1m], status line [HTTP/1.1 401 Unauthorized]
Authentication finally failed];, errorDetails=[]}, retrying (attempt #12).
2025-04-22T09:19:09.355+01:00 INFO [Messages] Bulk indexing finally successful (attempt #13).
2025-04-22T09:19:11.085+01:00 INFO [Messages] Bulk indexing finally successful (attempt #13).
2025-04-22T09:19:14.203+01:00 INFO [Messages] Bulk indexing finally successful (attempt #13).
It’s worth noting the cluster claims to be healthy… Even though the server.log above says otherwise!