Failled to add new Datanode

1. Describe your incident:
I’ve got 3 VM.

  1. graylog-server + mongod

  2. graylog-datanode : datanode-1

  3. graylog-datanode : datanode-2

When i connect the second node i’ve got this issue :

2. Describe your environment:

  • OS Information: Debian 12 + graylog

  • graylog-datanode 6.2.1-1

  • graylog-server 6.1.10-1

  • mongodb-org 7.0.20

  • Service logs, configurations, and environment variables:
    graylog-server : server.conf

is_leader = true
node_id_file = /etc/graylog/server/node-id
password_secret = 3sCyBEmyLNNwR.........38tZ2dl
root_password_sha2 = 2cb4b1431b84ec15d35ed8........9cc4b25c8d879ae23e18
bin_dir = /usr/share/graylog-server/bin
data_dir = /var/lib/graylog-server
plugin_dir = /usr/share/graylog-server/plugin
stream_aware_field_types=false
disabled_retention_strategies = none,close
allow_leading_wildcard_searches = false
allow_highlighting = false
field_value_suggestion_mode = on
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://localhost/graylog
mongodb_max_connections = 1000
integrations_scripts_dir = /usr/share/graylog-server/scripts
http_bind_address = 0.0.0.0:9000
message_journal_max_age = 12h
message_journal_max_size = 3gb

graylog-datanode-1 / 2 : datanode.conf

[sudo] Mot de passe de user :
node_id_file = /etc/graylog/datanode/node-id
config_location = /etc/graylog/datanode
password_secret = 3sCyBEmyLNNwRDp1WaDGU0rWKDF9uIgWRHA7Id6PmonEmC3SjkMqv1JZ8TlMHfLODLIgn7xkOfSvMsu3GJWI5y5A938tZ2dl
root_password_sha2 =
mongodb_uri = mongodb://172.28.128.150:27017/graylog
opensearch_location = /usr/share/graylog-datanode/dist
opensearch_config_location = /var/lib/graylog-datanode/opensearch/config
opensearch_data_location = /var/lib/graylog-datanode/opensearch/data
opensearch_logs_location = /var/log/graylog-datanode/opensearch

logs datanode-2

14:48:05.718 [opensearch[datanode-2][transport_worker][T#1]] ERROR org.opensearch.transport.netty4.ssl.SecureNetty4Transport - Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: No subject alternative DNS name matching datanode-1 found.
javax.net.ssl.SSLHandshakeException: No subject alternative DNS name matching datanode-1 found.
	at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:130) ~[?:?]
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:378) ~[?:?]
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321) ~[?:?]
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:316) ~[?:?]
	at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1318) ~[?:?]
[2025-05-06T15:37:10,755][ERROR][o.o.t.n.s.SecureNetty4Transport] [datanode-2] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)

3. What steps have you already taken to try and solve the problem?
I tried to renew the certificate by clicking “renew-certificate” on the web interface.

4. How can the community help?
I don’t understand what to do to solve this issue.

Thanks all folks

Hi @Etny,

First of all I’d unify graylog-server and graylog-datanode packages to the same version, preferably 6.2. This will help us with debugging and makes sure you have all APIs in sync. Could you please upgrade your server?

How is your networking configured? Do all nodes see each other, accessible by the name? Can you ping the datanode-1 from datanode-2?

Hi Tdvorak,
I restarted the Graylog services (datanode, server, and mongod), tested connectivity with ping, but in the end, a reboot solved the issue.
After that, I upgraded my setup as you suggested — and tada! Everything is green now!


However, all my messages seems to failled :

  2025-05-07T16:42:03.042+02:00 ERROR [IndexFieldTypePollerPeriodical] Couldn't update field types for index set <File Beat index/681b5392484710fa3d48a873>
org.graylog.shaded.opensearch2.org.opensearch.OpenSearchException: Unable to retrieve field types of index filebeat_index_23
        at org.graylog.storage.opensearch2.OpenSearchClient.exceptionFrom(OpenSearchClient.java:211) ~[?:?]
        at org.graylog.storage.opensearch2.OpenSearchClient.execute(OpenSearchClient.java:153) ~[?:?]
        at org.graylog.storage.opensearch2.OpenSearchClient.executeRequest(OpenSearchClient.java:172) ~[?:?]
        at org.graylog.storage.opensearch2.mapping.FieldMappingApi.fieldTypes(FieldMappingApi.java:51) ~[?:?]
        at org.graylog.storage.opensearch2.IndexFieldTypePollerAdapterOS2.pollIndex(IndexFieldTypePollerAdapterOS2.java:60) ~[?:?]
        at org.graylog2.indexer.fieldtypes.IndexFieldTypePoller.pollIndex(IndexFieldTypePoller.java:94) ~[graylog.jar:?]
        at org.graylog2.indexer.fieldtypes.IndexFieldTypePollerPeriodical.lambda$poll$5(IndexFieldTypePollerPeriodical.java:205) ~[graylog.jar:?]
        at com.codahale.metrics.InstrumentedScheduledExecutorService$InstrumentedRunnable.run(InstrumentedScheduledExecutorService.java:241) [graylog.jar:?]
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
        at java.base/java.lang.Thread.run(Unknown Source) [?:?]
Caused by: org.graylog.shaded.opensearch2.org.opensearch.client.ResponseException: method [GET], host [https://datanode-1:9200], URI [/filebeat_index_23/_mapping], status line [HTTP/1.1 404 Not Found]
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [filebeat_index_23]","index":"filebeat_index_23","resource.id":"filebeat_index_23","resource.type":"index_or_alias","index_uuid":"_na_"}],"type":"index_not_found_exception","reason":"no such index [filebeat_index_23]","index":"filebeat_index_23","resource.id":"filebeat_index_23","resource.type":"index_or_alias","index_uuid":"_na_"},"status":404}
        at org.graylog.shaded.opensearch2.org.opensearch.client.RestClient.convertResponse(RestClient.java:479) ~[?:?]
        at org.graylog.shaded.opensearch2.org.opensearch.client.RestClient.performRequest(RestClient.java:371) ~[?:?]
        at org.graylog.shaded.opensearch2.org.opensearch.client.RestClient.performRequest(RestClient.java:346) ~[?:?]
        at org.graylog.storage.opensearch2.OpenSearchClient.lambda$executeRequest$7(OpenSearchClient.java:173) ~[?:?]
        at org.graylog.storage.opensearch2.OpenSearchClient.execute(OpenSearchClient.java:151) ~[?:?]
        ... 12 more
2025-05-07T16:42:05.043+02:00 WARN  [IndexFieldTypePollerPeriodical] Active write index for index set "Graylog Events" (6819d107ca61280408d40272) doesn't exist yet
2025-05-07T16:42:08.043+02:00 WARN  [IndexFieldTypePollerPeriodical] Active write index for index set "Default index set" (6819d106ca61280408d401b3) doesn't exist yet
2025-05-07T16:42:09.042+02:00 WARN  [IndexFieldTypePollerPeriodical] Active write index for index set "File Beat index" (681b5392484710fa3d48a873) doesn't exist yet
2025-05-07T16:42:11.043+02:00 WARN  [IndexFieldTypePollerPeriodical] Active write index for index set "Graylog System Events" (6819d107ca61280408d40275) doesn't exist yet
2025-05-07T16:42:14.044+02:00 WARN  [IndexFieldTypePollerPeriodical] Active write index for index set "Default index set" (6819d106ca61280408d401b3) doesn't exist yet
user@graylog-server:~$ ping datanode-1
PING datanode-1 (172.28.128.151) 56(84) bytes of data.
64 bytes from datanode-1 (172.28.128.151): icmp_seq=1 ttl=64 time=0.153 ms
64 bytes from datanode-1 (172.28.128.151): icmp_seq=2 ttl=64 time=0.190 ms
^X64 bytes from datanode-1 (172.28.128.151): icmp_seq=3 ttl=64 time=0.177 ms
^C
--- datanode-1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2029ms
rtt min/avg/max/mdev = 0.153/0.173/0.190/0.015 ms
user@graylog-server:~$ ping datanode-1^?2
ping: datanode-12: Nom ou service inconnu
user@graylog-server:~$ ping datanode-2
PING datanode-2 (172.28.128.152) 56(84) bytes of data.
64 bytes from datanode-2 (172.28.128.152): icmp_seq=1 ttl=64 time=0.117 ms
64 bytes from datanode-2 (172.28.128.152): icmp_seq=2 ttl=64 time=0.156 ms
^X64 bytes from datanode-2 (172.28.128.152): icmp_seq=3 ttl=64 time=0.224 ms
^C
--- datanode-2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2053ms
rtt min/avg/max/mdev = 0.117/0.165/0.224/0.044 ms
user@graylog-server:~$
user@datanode-1:~$ sudo systemctl status graylog-datanode.service
● graylog-datanode.service - Graylog data node
     Loaded: loaded (/lib/systemd/system/graylog-datanode.service; enabled; preset: enabled)
     Active: active (running) since Tue 2025-05-06 14:34:04 UTC; 24h ago
       Docs: http://docs.graylog.org/
   Main PID: 883 (java)
      Tasks: 223 (limit: 19134)
     Memory: 2.9G
        CPU: 29min 3.005s
     CGroup: /system.slice/graylog-datanode.service
             ├─ 883 /usr/share/graylog-datanode/jvm/bin/java -Dlog4j.configurationFile=file:///etc/graylog/datanode/log>
             └─1323 /usr/share/graylog-datanode/dist/opensearch-2.15.0-linux-x64/jdk/bin/java -Xshare:auto -Dopensearch>

mai 06 14:34:04 datanode-1 systemd[1]: Started graylog-datanode.service - Graylog data node.

I only have the default stream and default index set. I don’t understand why it’s failing.

Hey, good to hear that your networking issues are solved now!

Regarding the indexing failures - I’d suggest rotating your indices (System - indices - open one index set - top right “Maintenance” button).

It seems that due to the initial error, some of the indices and aliases are not correctly initialized. Rotating them could fix these problems.

Hi,
I tried rotating but i still have the same errors.

user@datanode-1:~$ tail -n 20 /var/log/graylog-datanode/datanode.log
2025-05-09T08:14:42.295+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:14:42,295][INFO ][o.o.j.s.JobSweeper       ] [datanode-1] Running full sweep
2025-05-09T08:19:42.295+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:19:42,295][INFO ][o.o.j.s.JobSweeper       ] [datanode-1] Running full sweep
2025-05-09T08:24:42.296+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:24:42,295][INFO ][o.o.j.s.JobSweeper       ] [datanode-1] Running full sweep
2025-05-09T08:29:42.297+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:29:42,296][INFO ][o.o.j.s.JobSweeper       ] [datanode-1] Running full sweep
2025-05-09T08:33:56.986+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:33:56,986][WARN ][o.o.s.a.BackendRegistry  ] [datanode-1] Authentication finally failed for null from 172.28.128.150:36472
2025-05-09T08:33:56.997+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:33:56,997][WARN ][o.o.s.a.BackendRegistry  ] [datanode-1] Authentication finally failed for null from 172.28.128.150:36472
2025-05-09T08:34:41.808+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:34:41,808][INFO ][o.o.i.i.IndexStateManagementHistory] [datanode-1] Deleting old history indices viz []
2025-05-09T08:34:41.808+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:34:41,808][INFO ][o.o.i.i.IndexStateManagementHistory] [datanode-1] .opendistro-ism-managed-index-history-write not rolled over. Conditions were: {[max_docs: 2500000]=false, [max_age: 1d]=false}
2025-05-09T08:34:41.811+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:34:41,811][INFO ][o.o.t.t.CronTransportAction] [datanode-1] Start running hourly cron.
2025-05-09T08:34:41.811+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:34:41,811][INFO ][o.o.a.t.ADTaskManager    ] [datanode-1] Start to maintain running historical tasks
2025-05-09T08:34:41.813+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:34:41,811][INFO ][o.o.t.c.HourlyCron       ] [datanode-1] Hourly maintenance succeeds
2025-05-09T08:34:42.298+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:34:42,298][INFO ][o.o.j.s.JobSweeper       ] [datanode-1] Running full sweep
2025-05-09T08:39:42.298+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:39:42,298][INFO ][o.o.j.s.JobSweeper       ] [datanode-1] Running full sweep
2025-05-09T08:44:42.299+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:44:42,299][INFO ][o.o.j.s.JobSweeper       ] [datanode-1] Running full sweep
2025-05-09T08:49:42.299+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:49:42,299][INFO ][o.o.j.s.JobSweeper       ] [datanode-1] Running full sweep
2025-05-09T08:54:42.301+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:54:42,301][INFO ][o.o.j.s.JobSweeper       ] [datanode-1] Running full sweep
2025-05-09T08:59:42.302+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:59:42,302][INFO ][o.o.j.s.JobSweeper       ] [datanode-1] Running full sweep
2025-05-09T09:04:42.303+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T09:04:42,303][INFO ][o.o.j.s.JobSweeper       ] [datanode-1] Running full sweep
2025-05-09T09:09:42.304+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T09:09:42,303][INFO ][o.o.j.s.JobSweeper       ] [datanode-1] Running full sweep
2025-05-09T09:14:42.304+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T09:14:42,304][INFO ][o.o.j.s.JobSweeper       ] [datanode-1] Running full sweep
user@datanode-2:~$ tail -n 20 /var/log/graylog-datanode/datanode.log
2025-05-09T08:09:42.030+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:09:42,030][INFO ][o.o.j.s.JobSweeper       ] [datanode-2] Running full sweep
2025-05-09T08:14:42.031+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:14:42,031][INFO ][o.o.j.s.JobSweeper       ] [datanode-2] Running full sweep
2025-05-09T08:19:42.032+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:19:42,031][INFO ][o.o.j.s.JobSweeper       ] [datanode-2] Running full sweep
2025-05-09T08:24:42.032+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:24:42,032][INFO ][o.o.j.s.JobSweeper       ] [datanode-2] Running full sweep
2025-05-09T08:29:42.033+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:29:42,032][INFO ][o.o.j.s.JobSweeper       ] [datanode-2] Running full sweep
2025-05-09T08:33:56.993+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:33:56,993][WARN ][o.o.s.a.BackendRegistry  ] [datanode-2] Authentication finally failed for null from 172.28.128.150:52780
2025-05-09T08:34:41.614+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:34:41,614][INFO ][o.o.i.i.IndexStateManagementHistory] [datanode-2] Deleting old history indices viz []
2025-05-09T08:34:41.615+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:34:41,614][INFO ][o.o.i.i.IndexStateManagementHistory] [datanode-2] .opendistro-ism-managed-index-history-write not rolled over. Conditions were: {[max_docs: 2500000]=false, [max_age: 1d]=false}
2025-05-09T08:34:41.621+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:34:41,621][INFO ][o.o.t.t.CronTransportAction] [datanode-2] Start running hourly cron.
2025-05-09T08:34:41.622+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:34:41,621][INFO ][o.o.a.t.ADTaskManager    ] [datanode-2] Start to maintain running historical tasks
2025-05-09T08:34:41.622+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:34:41,622][INFO ][o.o.t.c.HourlyCron       ] [datanode-2] Hourly maintenance succeeds
2025-05-09T08:34:42.033+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:34:42,033][INFO ][o.o.j.s.JobSweeper       ] [datanode-2] Running full sweep
2025-05-09T08:39:42.033+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:39:42,033][INFO ][o.o.j.s.JobSweeper       ] [datanode-2] Running full sweep
2025-05-09T08:44:42.034+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:44:42,034][INFO ][o.o.j.s.JobSweeper       ] [datanode-2] Running full sweep
2025-05-09T08:49:42.035+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:49:42,034][INFO ][o.o.j.s.JobSweeper       ] [datanode-2] Running full sweep
2025-05-09T08:54:42.035+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:54:42,035][INFO ][o.o.j.s.JobSweeper       ] [datanode-2] Running full sweep
2025-05-09T08:59:42.036+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T08:59:42,036][INFO ][o.o.j.s.JobSweeper       ] [datanode-2] Running full sweep
2025-05-09T09:04:42.037+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T09:04:42,037][INFO ][o.o.j.s.JobSweeper       ] [datanode-2] Running full sweep
2025-05-09T09:09:42.038+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T09:09:42,037][INFO ][o.o.j.s.JobSweeper       ] [datanode-2] Running full sweep
2025-05-09T09:14:42.038+02:00 INFO  [OpensearchProcessImpl] [2025-05-09T09:14:42,038][INFO ][o.o.j.s.JobSweeper       ] [datanode-2] Running full sweep
user@graylog-server:~$ tail -n 20 /var/log/graylog-server/server.log
        at org.graylog.storage.opensearch2.mapping.FieldMappingApi.fieldTypes(FieldMappingApi.java:51) ~[?:?]
        at org.graylog.storage.opensearch2.IndexFieldTypePollerAdapterOS2.pollIndex(IndexFieldTypePollerAdapterOS2.java:60) ~[?:?]
        at org.graylog2.indexer.fieldtypes.IndexFieldTypePoller.pollIndex(IndexFieldTypePoller.java:94) ~[graylog.jar:?]        at org.graylog2.indexer.fieldtypes.IndexFieldTypePollerPeriodical.lambda$poll$5(IndexFieldTypePollerPeriodical.java:205) ~[graylog.jar:?]
        at com.codahale.metrics.InstrumentedScheduledExecutorService$InstrumentedRunnable.run(InstrumentedScheduledExecutorService.java:241) [graylog.jar:?]
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
        at java.base/java.lang.Thread.run(Unknown Source) [?:?]
Caused by: org.graylog.shaded.opensearch2.org.opensearch.client.ResponseException: method [GET], host [https://datanode-1:9200], URI [/filebeat_index_23/_mapping], status line [HTTP/1.1 404 Not Found]
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [filebeat_index_23]","index":"filebeat_index_23","resource.id":"filebeat_index_23","resource.type":"index_or_alias","index_uuid":"_na_"}],"type":"index_not_found_exception","reason":"no such index [filebeat_index_23]","index":"filebeat_index_23","resource.id":"filebeat_index_23","resource.type":"index_or_alias","index_uuid":"_na_"},"status":404}
        at org.graylog.shaded.opensearch2.org.opensearch.client.RestClient.convertResponse(RestClient.java:479) ~[?:?]
        at org.graylog.shaded.opensearch2.org.opensearch.client.RestClient.performRequest(RestClient.java:371) ~[?:?]
        at org.graylog.shaded.opensearch2.org.opensearch.client.RestClient.performRequest(RestClient.java:346) ~[?:?]
        at org.graylog.storage.opensearch2.OpenSearchClient.lambda$executeRequest$7(OpenSearchClient.java:173) ~[?:?]
        at org.graylog.storage.opensearch2.OpenSearchClient.execute(OpenSearchClient.java:151) ~[?:?]
        ... 12 more
2025-05-09T09:17:31.043+02:00 WARN  [IndexFieldTypePollerPeriodical] Active write index for index set "Default index set" (6819d106ca61280408d401b3) doesn't exist yet

Hi
Does anybody have a idea to help me?
I don’t know where to search to solve this issue :

Timestamp	Index	Letter ID	Error message
5 minutes ago	graylog_deflector	01JV1QTWVE000019T7MNVVMJ49	[graylog_deflector] OpenSearchException[OpenSearch exception [type=index_not_found_exception, reason=no such index [graylog_deflector]]]

Here are my indices :

Thank you.

Could you please open in your browser this URL and post the results here? http://your_graylog_server:9000/api/datanodes/any/opensearch/_cat/indices

Then we can see if you have some missing indices in your opensearch. I assume that during the initial problems, you have some inconsistencies between the mongodb state and opensearch indices.

here it is :

green  open investigation_event_index_0    fiQj5v4WQC2seVQs7sR73g 1 0     0 0    208b    208b
green  open .opendistro_security           e0kng6uHRfSMuHRReuWchQ 1 0    10 0  68.5kb  68.5kb
yellow open .ds-gl-datanode-metrics-000001 CzFLABSvR4GDEQVjJnBQfQ 1 1  2814 0 719.5kb 719.5kb
green  open graylog_6                      B_BtivgpTrOKmO8vxtk1yg 1 0     0 0    208b    208b
yellow open .ds-gl-datanode-metrics-000002 T-rMzn5IRk63v2Y6L1kfhQ 1 1 14478 0     3mb     3mb
green  open graylog_1                      aGZwYlZlQAKc094NT2CCKA 1 0     0 0    208b    208b
green  open graylog_0                      Py0IH56HTh2vPBB1mVwhuw 1 0  2063 0   954kb   954kb
green  open gl-system-events_0             4pL3x0g8Ts2s5eYPPhOkhw 1 0     5 0  54.2kb  54.2kb
green  open graylog_5                      8YCmyU0zQ06TZkHIuZj1vQ 1 0  2515 0   1.4mb   1.4mb
green  open graylog_4                      OhBbti6BTIu8XdzRN8uX1A 1 0     0 0    208b    208b
green  open graylog_3                      SdOK0ZJyQZC6CgAB1pkpGg 1 0     0 0    208b    208b
green  open graylog_2                      _xuNaxvuT7KSSZYBr0DAzQ 1 0     0 0    208b    208b
green  open gl-events_0                    TOET9On6TsCJ4W_k3Q5q7w 1 0     0 0    208b    208b
green  open .plugins-ml-config             v_YSiBpKSb-ZbjkyIDCjaw 1 0     1 0   3.9kb   3.9kb
yellow open .opendistro-job-scheduler-lock TXuuBH6PRligRuFbihGezA 1 1     2 0  16.9kb  16.9kb

Ok, thank you!

I’d suggest to delete the file beat index set and if you don’t need it. If yes, you can always re-create it later. This should stop at least a portion of errors you see in your logs. Than we can continue with whatever’s left there.

Thank you for your help! I deleted the index as you suggested.

However, the more I work on this setup, the more I realize there’s a lot I don’t fully understand.
Here are my nodes :


I’m not sure it’s normal for both of my nodes to be cluster_manager and remote_cluster_client at the same time.
What I don’t understand is how logs are stored between datanode1 and datanode2.
I noticed that the indices are different on each node:


Could you explain how this works, or point me to some documentation that could help?

I guess that during the initial setup, where you had only one node available, some indices were created in it and not properly replicated to the other node afterwards. The opensearch cluster will be probably in yellow or red state, right?

You are mentioning that both nodes think they are cluster_manager, so it seems like they actually formed two independent clusters?

My suggestion would be to delete content of /data and /config directories in both data nodes, delete content of mongodb and restart everything. This will get you again to the preflight, gives you a fresh start, without all the problems that would be quite complex to fix now.

Additionally, I’d suggest adding a third data node. Two nodes are not enough for a stable cluster. They are prone to split-brain problems, which could be what we see in your situation.

You’re right, my cluster was yellow.

So I did what you suggested:
I deleted the contents of the data and config directories.

I did the same for datanode-2 and datanode-3.

root@datanode-1:/var/lib/graylog-datanode/opensearch# ls -la
total 16
drwxr-xr-x 4 root             root             4096 14 mai   13:31 .
drwxr-xr-x 4 graylog-datanode graylog-datanode 4096  6 mai   09:05 ..
drwxr-xr-x 2 graylog-datanode graylog-datanode 4096 14 mai   13:31 config
drwxr-xr-x 2 graylog-datanode graylog-datanode 4096 14 mai   13:37 data
root@graylog-server:/var/lib/mongodb# rm -r *
root@graylog-server:/var/lib/mongodb# ls -la
total 20
drwxr-xr-x  2 mongodb mongodb 16384 14 mai   13:42 .
drwxr-xr-x 26 root    root     4096  6 mai   08:41

I also added a third data node, as you recommended.
Then I rebooted everything.
Now, graylog-server, mongodb, and all graylog-datanode services are up and running.

And then…

user@graylog-server:~$ sudo tail -n 7 /var/log/graylog-server/server.log

========================================================================================================

2025-05-14T15:49:04.101+02:00 INFO  [CustomCAX509TrustManager] CA changed, refreshing trust manager
2025-05-14T15:49:13.657+02:00 INFO  [CaKeystore] Signing certificate for  node cfbabaf3-e873-4977-901c-14f13b302b11, subject: CN=datanode-3
2025-05-14T15:49:13.689+02:00 INFO  [CaKeystore] Signing certificate for  node 4f15e49d-84d3-4dc7-9c13-49153bac64b9, subject: CN=datanode-1
2025-05-14T15:49:13.707+02:00 INFO  [CaKeystore] Signing certificate for  node e899cf63-a07f-45e4-99af-18f8716188e9, subject: CN=datanode-2
user@graylog-server:~$ sudo tail -n 7 /var/log/mongodb/mongod.log
{"t":{"$date":"2025-05-14T13:49:04.757+00:00"},"s":"I",  "c":"NETWORK",  "id":6788700, "ctx":"conn38","msg":"Received first command on ingress connection since session start or auth handshake","attr":{"elapsedMillis":0}}
{"t":{"$date":"2025-05-14T13:49:05.016+00:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"172.28.128.151:60852","uuid":{"uuid":{"$uuid":"9e3a69c2-0639-4d79-8c47-dc35eedb477e"}},"connectionId":39,"connectionCount":20}}
{"t":{"$date":"2025-05-14T13:49:05.016+00:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn39","msg":"client metadata","attr":{"remote":"172.28.128.151:60852","client":"conn39","negotiatedCompressors":[],"doc":{"driver":{"name":"mongo-java-driver|legacy","version":"5.4.0"},"os":{"type":"Linux","name":"Linux","architecture":"amd64","version":"6.1.0-34-amd64"},"platform":"Java/Eclipse Adoptium/17.0.14+7"}}}
{"t":{"$date":"2025-05-14T13:49:05.017+00:00"},"s":"I",  "c":"NETWORK",  "id":6788700, "ctx":"conn39","msg":"Received first command on ingress connection since session start or auth handshake","attr":{"elapsedMillis":0}}
{"t":{"$date":"2025-05-14T13:49:36.408+00:00"},"s":"I",  "c":"WTCHKPT",  "id":22430,   "ctx":"Checkpointer","msg":"WiredTiger message","attr":{"message":{"ts_sec":1747230576,"ts_usec":408638,"thread":"741:0x7f46cd6666c0","session_name":"WT_SESSION.checkpoint","category":"WT_VERB_CHECKPOINT_PROGRESS","category_id":6,"verbose_level":"DEBUG_1","verbose_level_id":1,"msg":"saving checkpoint snapshot min: 1036, snapshot max: 1036 snapshot count: 0, oldest timestamp: (0, 0) , meta checkpoint timestamp: (0, 0) base write gen: 1"}}}
{"t":{"$date":"2025-05-14T13:50:36.444+00:00"},"s":"I",  "c":"WTCHKPT",  "id":22430,   "ctx":"Checkpointer","msg":"WiredTiger message","attr":{"message":{"ts_sec":1747230636,"ts_usec":444697,"thread":"741:0x7f46cd6666c0","session_name":"WT_SESSION.checkpoint","category":"WT_VERB_CHECKPOINT_PROGRESS","category_id":6,"verbose_level":"DEBUG_1","verbose_level_id":1,"msg":"saving checkpoint snapshot min: 1221, snapshot max: 1221 snapshot count: 0, oldest timestamp: (0, 0) , meta checkpoint timestamp: (0, 0) base write gen: 1"}}}
{"t":{"$date":"2025-05-14T13:51:36.465+00:00"},"s":"I",  "c":"WTCHKPT",  "id":22430,   "ctx":"Checkpointer","msg":"WiredTiger message","attr":{"message":{"ts_sec":1747230696,"ts_usec":465771,"thread":"741:0x7f46cd6666c0","session_name":"WT_SESSION.checkpoint","category":"WT_VERB_CHECKPOINT_PROGRESS","category_id":6,"verbose_level":"DEBUG_1","verbose_level_id":1,"msg":"saving checkpoint snapshot min: 1402, snapshot max: 1402 snapshot count: 0, oldest timestamp: (0, 0) , meta checkpoint timestamp: (0, 0) base write gen: 1"}}}

Even though everything seems to be working fine, I’m still getting these error lines:


/var/log/graylog-datanode/datanode.log:2025-05-14T15:54:46.130+02:00 INFO  [OpensearchProcessImpl] [2025-05-14T15:54:46,114][ERROR][o.o.t.n.s.SecureNetty4Transport] [datanode-1] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
/var/log/graylog-datanode/datanode.log:2025-05-14T15:54:46.131+02:00 INFO  [OpensearchProcessImpl] [2025-05-14T15:54:46,114][ERROR][o.o.t.n.s.SecureNetty4Transport] [datanode-1] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)

I understand this is probably a certificate issue — but I literally deleted everything.
So… what am I supposed to do now, besides smashing my keyboard?

Ok, let’s double check everything.

  • Are now running the same graylog server and data node versions?
  • Have you deleted content of mongodb before starting all services?
  • Is your CA self-signed, created during the preflight, or are you using a custom CA?
  • Are you using any custom JVM or those bundled with graylog server and data node? Any JAVA_HOME or OPENSEARCH_JAVA_HOME env properties?
  • Can you upload full data node logs?

Hello !

  • Are now running the same graylog server and data node versions?
    Yes, absolutely… Not!
    Due to all my tests and snapshot restorations, the nodes and server were not on the same version.
    I’ve now upgraded everything to version 6.2.2-1.

Have you deleted content of mongodb before starting all services?
Yes, here’s what I did:

  • Shut down the data node, Graylog server, and MongoDB

  • On the data node: deleted data and config files

  • On the server: deleted the contents of /var/lib/mongodb

  • Rebooted all VMs

  • Launched the preflight web interface

  • Created the CA

  • Configured the renewal policy

  • Provisioned the certificates

  • Prayed

  • no changes

Are you using any custom JVM or those bundled with graylog server and data node? Any JAVA_HOME or OPENSEARCH_JAVA_HOME env properties?
No custom JVM.
I just installed Debian 12 and followed the Graylog installation instructions from the official documentation.

A few moments later… i’ve got a clue.

  1. When i run only one datanode, it seems to work fine :

  1. I add one node, click “restart the configuration”, still looks good :

  1. I add the third one…

  1. I delete the first two…

  1. I turn the first two back on…

  2. Click “Restart Configuration”

  3. And… still the same issue as at the beginning of the week!


Maintenance operations like recalculate or rotate are painless.

If i clear everythings and restart the preflight…

The only thing that always works is with one node without any problem !

Thanks for all the information!

At this point, my assumption is that, for some reason, your nodes form two different clusters. You create a 2-node cluster first and when the third node starts, it’s unable to join the cluster. Then, you end up in a strange situation with 2 partially working clusters. I believe we can confirm that in logs, but I would need full datanode logs of all three nodes.

I think you can also try to start all three nodes at once and only then go to the preflight and start with provisioning.

The cluster_manager role you see in the cluster overview is just a list of roles - meaning every node can become a manager, not that every is actually a manager.

This call can give us also some hints: http://your_graylog_server:9000/api/datanodes/any/opensearch/_cluster/state/