Elasticsearch cluster datanode-cluster is red. Shards: 1 unassigned

1. Describe your incident:
I’m using Graylog Datanode and the server stopped saving new logs because
Elasticsearch cluster datanode-cluster is red. Shards: 40 active, 0 initializing, 0 relocating, 1 unassigned.
I didn’t do any change to the cluster; there’s plenty of disk space

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3        94G   29G   61G  33% /

and there’s plenty of RAM memory

              total        used        free      shared  buff/cache   available
Mem:           5.8G        3.8G      488.0M      588.0K        1.5G        1.8G
Swap:          4.0G           0        4.0G

2. Describe your environment:

  • OS Information: Alpine Linux 3.20 running as virtual machine on Proxmox VE (with daily backups using PBS)

  • Package Version: MongoDB 5.0, Graylog Enterprise 6.0, Graylog Datanode 6.0

  • Service logs, configurations, and environment variables:
    running on Docker Compose

version: "3.8"

    hostname: "mongodb"
    image: "mongo:5.0"
      - "mongodb_data:/data/db"
    restart: "no"
      - graylog_network

    image: "${DATANODE_IMAGE:-graylog/graylog-datanode:6.0}"
        condition: "service_started"
    hostname: "datanode"
        - stack.env
      GRAYLOG_DATANODE_NODE_ID_FILE: "/var/lib/graylog-datanode/node-id"
      GRAYLOG_DATANODE_MONGODB_URI: "mongodb://mongodb:27017/graylog"
        hard: -1
        soft: -1
        soft: 65536
        hard: 65536
      - "8999:8999/tcp"   # DataNode API
      - "9200:9200/tcp"
      - "9300:9300/tcp"
      - "graylog-datanode:/var/lib/graylog-datanode"
      - "/opt/geodb:/opt/geodb"
    restart: "no"
      - graylog_network

    hostname: "server"
    image: "${GRAYLOG_IMAGE:-graylog/graylog-enterprise:6.0}"
        condition: "service_started"
    entrypoint: "/usr/bin/tini --  /docker-entrypoint.sh"
        - stack.env
      GRAYLOG_NODE_ID_FILE: "/usr/share/graylog/data/data/node-id"
      GRAYLOG_HTTP_EXTERNAL_URI: "http://localhost:9000/"
      GRAYLOG_MONGODB_URI: "mongodb://mongodb:27017/graylog"
      # To make reporting (headless_shell) work inside a Docker container
    - "5140:5140/udp"   # OPNsense logs
    - "5142:5142/udp"   # Linux logs
    - "9000:9000/tcp"   # Server API
      - "graylog_data:/usr/share/graylog/data/data"
      - "graylog_journal:/usr/share/graylog/data/journal"
      - "/opt/geodb:/opt/geodb"
    restart: "no"
      - graylog_network

    driver: bridge

3. What steps have you already taken to try and solve the problem?
I tried to generate client certificates to connect using Curl to Opensearch/Elasticsearch (following this document: Graylog Data Node - Getting Started) but I get Authentication finally failed.
Also tried to restart the virtual machine multiple times.

4. How can the community help?
Provide instruction on how to resolve this issue.

Hi @mlazzarotto can you share the curl command you used that returned Authentication finally failed?

I believe this means the curl request didn’t connect using the issued client certificate.

Hi @drewmiranda-gl,
this is the command that I used:
curl -v "https://localhost:9200/_cluster/health?pretty" -k --cert datanode_certificate --key datanode_private
The datanode_certificate and datanode_private are valid files that I’ve obtained from the Graylog web interface.
When I run that command, the datanode docker container logs show this message:
2024-08-08T18:32:00.145Z INFO [OpensearchProcessImpl] [2024-08-08T18:32:00,143][WARN ][o.o.s.a.BackendRegistry ] [datanode] Authentication finally failed for null from

I just ran through a test to see i can reproduce this:

  • Used the docker compose file you provided (with minor edits such as adding back the missing envvars and using a different env file)
  • Ran through the first run screens for datanode
  • Generated a client certificate
  • Used the copy buttons to save the 3 certs to files (though i don’t need the CA cert)
  • Executed the curl command exacly as you listed but used different file names:
    • curl -v "https://localhost:9200/_cluster/health?pretty" -k --cert cert.crt --key cert.key

Doing so does successfully return:

  "cluster_name" : "datanode-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "discovered_master" : true,
  "discovered_cluster_manager" : true,
  "active_primary_shards" : 11,
  "active_shards" : 11,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0

What response are you getting back from the curl command? Does it return an HTTP code?

Are you able to verify the cert files using (borrowed from here):

openssl x509 -noout -modulus -in cert.crt | openssl md5 > /tmp/crt.pub
openssl rsa -noout -modulus -in cert.key | openssl md5 > /tmp/key.pub
diff /tmp/crt.pub /tmp/key.pub


Hi Drew,

What response are you getting back from the curl command? Does it return an HTTP code?

“Authentication finally failed”
“401 Unauthorized”
This is the full verbose output:

* Host logservernew.lab.mydomain.it:9200 was resolved.
* IPv6: ::1
* IPv4:
*   Trying [::1]:9200...
* Connected to logservernew.lab.mydomain.it (::1) port 9200
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, CERT verify (15):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / x25519 / RSASSA-PSS
* ALPN: server did not agree on a protocol. Uses default.
* Server certificate:
*  subject: CN=datanode
*  start date: Jul 17 21:22:26 2024 GMT
*  expire date: Jul 17 21:22:26 2025 GMT
*  issuer: CN=Graylog CA
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
*   Certificate level 0: Public key type RSA (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
* using HTTP/1.x
> GET /_cluster/health?pretty HTTP/1.1
> Host: logservernew.lab.mydomain.it:9200
> User-Agent: curl/8.7.1
> Accept: */*
* Request completely sent off
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/1.1 401 Unauthorized
< content-type: text/plain; charset=UTF-8
< content-length: 29
* Connection #0 to host logservernew.lab.mydomain.it left intact
Authentication finally failed

Are you able to verify the cert files using (borrowed from here):

No, the command openssl rsa -noout -modulus -in cert.key | openssl md5 > /tmp/key.pub returns this error:Could not find private key from certificate.txt and that makes sense because the certificate should not contain the private key.

