mlazzarotto
(Marco Lazzarotto)
August 3, 2024, 10:23am
1
1. Describe your incident:
I’m using Graylog Datanode and the server stopped saving new logs because
Elasticsearch cluster datanode-cluster is red. Shards: 40 active, 0 initializing, 0 relocating, 1 unassigned
.
I didn’t do any change to the cluster; there’s plenty of disk space
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 94G 29G 61G 33% /
and there’s plenty of RAM memory
total used free shared buff/cache available
Mem: 5.8G 3.8G 488.0M 588.0K 1.5G 1.8G
Swap: 4.0G 0 4.0G
2. Describe your environment:
OS Information: Alpine Linux 3.20 running as virtual machine on Proxmox VE (with daily backups using PBS)
Package Version: MongoDB 5.0, Graylog Enterprise 6.0, Graylog Datanode 6.0
Service logs, configurations, and environment variables:
running on Docker Compose
version: "3.8"
services:
mongodb:
hostname: "mongodb"
image: "mongo:5.0"
volumes:
- "mongodb_data:/data/db"
restart: "no"
networks:
- graylog_network
datanode:
image: "${DATANODE_IMAGE:-graylog/graylog-datanode:6.0}"
depends_on:
mongodb:
condition: "service_started"
hostname: "datanode"
env_file:
- stack.env
environment:
GRAYLOG_DATANODE_NODE_ID_FILE: "/var/lib/graylog-datanode/node-id"
GRAYLOG_DATANODE_MONGODB_URI: "mongodb://mongodb:27017/graylog"
ulimits:
memlock:
hard: -1
soft: -1
nofile:
soft: 65536
hard: 65536
ports:
- "8999:8999/tcp" # DataNode API
- "9200:9200/tcp"
- "9300:9300/tcp"
volumes:
- "graylog-datanode:/var/lib/graylog-datanode"
- "/opt/geodb:/opt/geodb"
restart: "no"
networks:
- graylog_network
graylog:
hostname: "server"
image: "${GRAYLOG_IMAGE:-graylog/graylog-enterprise:6.0}"
depends_on:
mongodb:
condition: "service_started"
entrypoint: "/usr/bin/tini -- /docker-entrypoint.sh"
env_file:
- stack.env
environment:
GRAYLOG_NODE_ID_FILE: "/usr/share/graylog/data/data/node-id"
GRAYLOG_HTTP_BIND_ADDRESS: "0.0.0.0:9000"
GRAYLOG_HTTP_EXTERNAL_URI: "http://localhost:9000/"
GRAYLOG_MONGODB_URI: "mongodb://mongodb:27017/graylog"
# To make reporting (headless_shell) work inside a Docker container
GRAYLOG_REPORT_DISABLE_SANDBOX: "true"
ports:
- "5140:5140/udp" # OPNsense logs
- "5142:5142/udp" # Linux logs
- "9000:9000/tcp" # Server API
volumes:
- "graylog_data:/usr/share/graylog/data/data"
- "graylog_journal:/usr/share/graylog/data/journal"
- "/opt/geodb:/opt/geodb"
restart: "no"
networks:
- graylog_network
volumes:
mongodb_data:
graylog-datanode:
graylog_data:
graylog_journal:
networks:
graylog_network:
driver: bridge
3. What steps have you already taken to try and solve the problem?
I tried to generate client certificates to connect using Curl to Opensearch/Elasticsearch (following this document: Graylog Data Node - Getting Started ) but I get Authentication finally failed
.
Also tried to restart the virtual machine multiple times.
4. How can the community help?
Provide instruction on how to resolve this issue.
Hi @mlazzarotto can you share the curl command you used that returned Authentication finally failed
?
I believe this means the curl request didn’t connect using the issued client certificate.
mlazzarotto
(Marco Lazzarotto)
August 8, 2024, 6:34pm
3
Hi @drewmiranda-gl ,
this is the command that I used:
curl -v "https://localhost:9200/_cluster/health?pretty" -k --cert datanode_certificate --key datanode_private
The datanode_certificate and datanode_private are valid files that I’ve obtained from the Graylog web interface.
When I run that command, the datanode docker container logs show this message:
2024-08-08T18:32:00.145Z INFO [OpensearchProcessImpl] [2024-08-08T18:32:00,143][WARN ][o.o.s.a.BackendRegistry ] [datanode] Authentication finally failed for null from 172.19.0.1:50992
I just ran through a test to see i can reproduce this:
Used the docker compose file you provided (with minor edits such as adding back the missing envvars and using a different env file)
Ran through the first run screens for datanode
Generated a client certificate
Used the copy buttons to save the 3 certs to files (though i don’t need the CA cert)
Executed the curl command exacly as you listed but used different file names:
curl -v "https://localhost:9200/_cluster/health?pretty" -k --cert cert.crt --key cert.key
Doing so does successfully return:
{
"cluster_name" : "datanode-cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"discovered_master" : true,
"discovered_cluster_manager" : true,
"active_primary_shards" : 11,
"active_shards" : 11,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
What response are you getting back from the curl command? Does it return an HTTP code?
Are you able to verify the cert files using (borrowed from here ):
openssl x509 -noout -modulus -in cert.crt | openssl md5 > /tmp/crt.pub
openssl rsa -noout -modulus -in cert.key | openssl md5 > /tmp/key.pub
diff /tmp/crt.pub /tmp/key.pub
Thanks!
mlazzarotto
(Marco Lazzarotto)
August 19, 2024, 5:23am
5
Hi Drew,
What response are you getting back from the curl command? Does it return an HTTP code?
“Authentication finally failed”
“401 Unauthorized”
This is the full verbose output:
* Host logservernew.lab.mydomain.it:9200 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
* Trying [::1]:9200...
* Connected to logservernew.lab.mydomain.it (::1) port 9200
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, CERT verify (15):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / x25519 / RSASSA-PSS
* ALPN: server did not agree on a protocol. Uses default.
* Server certificate:
* subject: CN=datanode
* start date: Jul 17 21:22:26 2024 GMT
* expire date: Jul 17 21:22:26 2025 GMT
* issuer: CN=Graylog CA
* SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Certificate level 0: Public key type RSA (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
* using HTTP/1.x
> GET /_cluster/health?pretty HTTP/1.1
> Host: logservernew.lab.mydomain.it:9200
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/1.1 401 Unauthorized
< content-type: text/plain; charset=UTF-8
< content-length: 29
<
* Connection #0 to host logservernew.lab.mydomain.it left intact
Authentication finally failed
Are you able to verify the cert files using (borrowed from here ):
No, the command openssl rsa -noout -modulus -in cert.key | openssl md5 > /tmp/key.pub
returns this error:Could not find private key from certificate.txt
and that makes sense because the certificate should not contain the private key.
system
(system)
Closed
September 2, 2024, 5:23am
6
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.