1. Describe your incident:
Graylog Index randomly breaks, because of unassigned shards. I get to message in the Graylog frontend. The first is when I look at the broken index:
OpenSearch cluster datanode-cluster is red. Shards: 49 active, 0 initializing, 0 relocating, 1 unassigned
The second is when I look at the stream that is using the index:
OpenSearch exception [type=search_phase_execution_exception, reason=all shards failed]
I figured out how to fix the broken problem by using curl to directly use the OpenSearch API, first figuring out which index is the problem:
# curl "https://localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason" -k --cert cert.crt --key cert.key
.opendistro-ism-managed-index-history-2025.08.18-000128 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.19-000129 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.16-000126 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.17-000127 0 p STARTED
.opendistro_security 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.15-000125 0 p STARTED
matrix_log_index_18 0 p STARTED
matrix_log_index_17 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.13-000123 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.14-000124 0 p STARTED
matrix_log_index_19 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.11-000121 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.12-000122 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.10-000120 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.28-000138 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.26-000136 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.24-000134 0 p STARTED
.opendistro-ism-managed-index-history-2025.09.07-000148 0 p STARTED
.opendistro-ism-config 0 p STARTED
matrix_log_index_20 0 p UNASSIGNED ALLOCATION_FAILED
.opendistro-ism-managed-index-history-2025.08.22-000132 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.09-000119 0 p STARTED
.opendistro-ism-managed-index-history-2025.09.03-000144 0 p STARTED
.opendistro-ism-managed-index-history-2025.09.05-000146 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.20-000130 0 p STARTED
.opendistro-ism-managed-index-history-2025.09.01-000142 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.31-000141 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.30-000140 0 p STARTED
graylog_9 0 p STARTED
.ds-gl-datanode-metrics-000003 0 p STARTED
.ds-gl-datanode-metrics-000004 0 p STARTED
graylog_8 0 p STARTED
.ds-gl-datanode-metrics-000001 0 p STARTED
.ds-gl-datanode-metrics-000002 0 p STARTED
gl-system-events_6 0 p STARTED
graylog_10 0 p STARTED
gl-system-events_7 0 p STARTED
.ds-gl-datanode-metrics-000005 0 p STARTED
graylog_11 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.29-000139 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.27-000137 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.25-000135 0 p STARTED
.opendistro-ism-managed-index-history-2025.09.06-000147 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.23-000133 0 p STARTED
gl-events_0 0 p STARTED
.plugins-ml-config 0 p STARTED
.opendistro-ism-managed-index-history-2025.08.21-000131 0 p STARTED
.opendistro-ism-managed-index-history-2025.09.04-000145 0 p STARTED
.opendistro-job-scheduler-lock 0 p STARTED
.opendistro-ism-managed-index-history-2025.09.02-000143 0 p STARTED
Then deleting the unassigned shard:
curl -X DELETE "https://localhost:9200/matrix_log_index_20" -k --cert cert.crt --key cert.key
2. Describe your environment:
I run everything in docker compose using this docker-compose.yaml on a single machine:
services:
mongodb:
image: "mongo:6.0"
restart: "always"
networks:
- graylog
volumes:
- "./access_guard/mongodb_data:/data/db"
- "./access_guard/mongodb_config:/data/configdb"
datanode:
image: "graylog/graylog-datanode:6.3.2"
hostname: "datanode"
environment:
GRAYLOG_DATANODE_NODE_ID_FILE: "/var/lib/graylog-datanode/node-id"
# GRAYLOG_DATANODE_PASSWORD_SECRET and GRAYLOG_PASSWORD_SECRET MUST be the same value
GRAYLOG_DATANODE_PASSWORD_SECRET: "{{ docker_graylog_password }}"
GRAYLOG_DATANODE_MONGODB_URI: "mongodb://mongodb:27017/graylog"
ulimits:
memlock:
hard: -1
soft: -1
nofile:
soft: 65536
hard: 65536
networks:
- graylog
volumes:
- "./access_guard/graylog-datanode:/var/lib/graylog-datanode"
restart: "always"
# Graylog: https://hub.docker.com/r/graylog/graylog-enterprise
graylog:
image: "graylog/graylog:6.3.2"
depends_on:
mongodb:
condition: "service_started"
datanode:
condition: "service_started"
entrypoint: "/usr/bin/tini -- /docker-entrypoint.sh"
environment:
GRAYLOG_NODE_ID_FILE: "/usr/share/graylog/data/data/node-id"
GRAYLOG_PASSWORD_SECRET: "{{ docker_graylog_password }}"
GRAYLOG_ROOT_PASSWORD_SHA2: "{{ docker_graylog_root_pw | hash('sha256') }}"
GRAYLOG_HTTP_BIND_ADDRESS: "0.0.0.0:9000"
GRAYLOG_HTTP_EXTERNAL_URI: "https://url.example.com/"
GRAYLOG_MONGODB_URI: "mongodb://mongodb:27017/graylog"
labels:
- "traefik.enable=true"
- "traefik.http.routers.graylog-router.rule=Host(`url.example.com`)"
- "traefik.http.routers.graylog-router.entrypoints=https"
- "traefik.http.routers.graylog-router.tls.certresolver=letsencrypt"
- "traefik.http.services.graylog-service.loadbalancer.server.port=9000"
ports:
- "[::]:12201-12202:12201-12202/udp" # GELF UDP - matrix
networks:
graylog:
ipv6_address: {{ pub_ip6_graylog }}
proxy:
volumes:
- "./access_guard/graylog_data:/usr/share/graylog/data/data"
restart: "always"
networks:
graylog:
driver: "bridge"
driver_opts:
com.docker.network.bridge.gateway_mode_ipv6: routed
enable_ipv6: true
ipam:
config:
- subnet: {{ pub_ip6_subnet }}
gateway: {{ pub_ip6_gateway }}
proxy:
external: true
name: {{ docker_reverse_proxy_network }}
3. What steps have you already taken to try and solve the problem?
I have tried looking for reasons of why the graylog datanode would randomly break, but not found enything with google. This has happend before with version 6.1 to now 6.3.2 but the my setup broke again.
4. How can the community help?
What can cause the OpenSearch instance in the datanode container to fail allocating a shard?