1. Describe your incident:
After running for 10 minutes, my leader graylgo node starts throwing authentication errors against my two Graylog-Datanode backend servers. There is a string of Java errors beginning with:
graylog-server Caused by: org.graylog.shaded.opensearch2.org.opensearch.client.ResponseException: method [GET], host [https://datanode-2:9200], URI [/p_f_237/_stats/store], status line [HTTP/1.1 401 Unauthorized]
and * graylog-server 2025-04-08T10:26:49.436+01:00 WARN [Messages] Caught exception during bulk indexing: ElasticsearchException{message=OpenSearchException[An error occurred: ]; nested: OpenSearchStatusException[Unable to parse response body]; nested: ResponseException[method [POST], host [https://Datanode-1:9200], URI [/_bulk?timeout=1m], status line [HTTP/1.1 401 Unauthorized]
2025-04-08 10:25:57.481*
And then a graylog-server Authentication finally failed
This happens every minute. The system is mostly working, though widgets / searches will occasionally return a 401 Unauthorized / Authentication finally failed" until refreshed.
The problem doesn’t seem to be present on the secondary graylog web node.
2. Describe your environment:
- OS Information: RHEL 9.5
- Package Version: graylog-server-6.1.10 & graylog-datanode-6.1.10
- Service logs, configurations, and environment variables:
I can zip up some logs and post them if needs be, but extended snippets are in the first reply:
server.conf
is_leader = true
node_id_file = /etc/graylog/server/node-id
password_secret = REMOVED
root_password_sha2 = REMOVED
bin_dir = /usr/share/graylog-server/bin
data_dir = /opt/graylog-server
plugin_dir = /usr/share/graylog-server/plugin
http_bind_address = 10.181.144.15:9000
http_publish_uri = https://graylog.domain.com:9000/
http_enable_tls = true
http_tls_cert_file = /etc/graylog/graylog.pem
http_tls_key_file = /etc/graylog/graylog.key
stream_aware_field_types=false
disabled_retention_strategies = none,close
allow_leading_wildcard_searches = false
allow_highlighting = false
field_value_suggestion_mode = on
output_batch_size = 5000
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /opt/graylog-server/journal
message_journal_max_age = 24h
message_journal_max_size = 60gb
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://graylog_user:password@graylog1/graylog
mongodb_max_connections = 1000
transport_email_enabled = true
transport_email_hostname = mailrelay.mail.com
transport_email_port = 25
transport_email_use_auth = false
datanode.conf
node_id_file = /etc/graylog/datanode/node-id
config_location = /etc/graylog/datanode
password_secret = REMOVED
root_password_sha2 = REMOVED
mongodb_uri = mongodb://graylog_user:password@graylog1/graylog
opensearch_location = /usr/share/graylog-datanode/dist
opensearch_config_location = /opt/graylog-datanode/opensearch/config
opensearch_data_location = /opt/graylog-datanode/opensearch/data
opensearch_logs_location = /var/log/graylog-datanode/opensearch
opensearch_heap = 24g
3. What steps have you already taken to try and solve the problem?
Checked logs, checked services, checked configs, all of these seem to be using the corect strings and SHA’s and passwords. Nothing seems to be different between the erroring and non-erroring graylog node except for the “is_leader” on the leader node.
4. How can the community help?
I’m not really sure why these errors are coming or what they mean and how I can fix them sorry, which is why i’m here!