Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!
1. Describe your incident:
Graylog container will get in an ‘unhealthy’ state, can sometimes crash after a day or a week. Logs point to interrupted shutdown then fails to connect to opensearch and mongodb.
2. Describe your environment:
-
OS Information:
Linux (ubuntu 22.04)
Standard D2s v3 Azure VM -
Package Version:
Mongodb 6.0
Opensearch 2.4.0
Garylog 6.0.1 -
Service logs, configurations, and environment variables:
Yaml file and graylog container log below covering time of crash.
version: "3.8"
services:
mongodb:
image: "mongo:6.0"
volumes:
- "mongodb_data:/data/db"
restart: "on-failure"
opensearch:
image: "opensearchproject/opensearch:2.4.0"
environment:
- "OPENSEARCH_JAVA_OPTS=-Xms1g -Xmx1g"
- "bootstrap.memory_lock=true"
- "discovery.type=single-node"
- "action.auto_create_index=false"
- "plugins.security.ssl.http.enabled=false"
- "plugins.security.disabled=true"
ulimits:
memlock:
hard: -1
soft: -1
nofile:
soft: 65536
hard: 65536
volumes:
- "os_data:/usr/share/opensearch/data"
restart: "on-failure"
graylog:
hostname: "server"
image: "${GRAYLOG_IMAGE:-graylog/graylog:6.0.1}"
volumes:
- "graylog_data:/usr/share/graylog/data/data"
- "graylog_journal:/usr/share/graylog/data/journal"
- "/home/azureuser/graylog/certs:/home/azureuser/graylog/certs"
depends_on:
opensearch:
condition: "service_started"
mongodb:
condition: "service_started"
entrypoint: "/usr/bin/tini -- wait-for-it opensearch:9200 -- /docker-entrypoint.sh"
environment:
GRAYLOG_NODE_ID_FILE: "/usr/share/graylog/data/config/node-id"
GRAYLOG_PASSWORD_SECRET: "${GRAYLOG_PASSWORD_SECRET:?Please configure GRAYLOG_PASSWORD_SECRET in the .env file}"
GRAYLOG_ROOT_PASSWORD_SHA2: "${GRAYLOG_ROOT_PASSWORD_SHA2:?Please configure GRAYLOG_ROOT_PASSWORD_SHA2 in the .env file}"
GRAYLOG_HTTP_BIND_ADDRESS: "0.0.0.0:9000"
# GRAYLOG_HTTP_EXTERNAL_URI: "http://0.0.0.0:9000/"
GRAYLOG_ELASTICSEARCH_HOSTS: "http://opensearch:9200"
GRAYLOG_MONGODB_URI: "mongodb://mongodb:27017/graylog"
GRAYLOG_HTTP_ENABLE_TLS: "false"
GRAYLOG_HTTP_TLS_CERT_FILE: "/home/azureuser/graylog/certs/cert.pem"
GRAYLOG_HTTP_TLS_KEY_FILE: "/home/azureuser/graylog/certs/privkey.pem"
# GRAYLOG_HTTP_TLS_KEY_PASSWORD: "secret"
GRAYLOG_TRANSPORT_EMAIL_ENABLED: true
GRAYLOG_TRANSPORT_EMAIL_HOSTNAME: "smtp.sendgrid.net"
GRAYLOG_TRANSPORT_EMAIL_PORT: "587"
GRAYLOG_TRANSPORT_EMAIL_USE_AUTH: "true"
GRAYLOG_TRANSPORT_EMAIL_AUTH_USERNAME: "apikey"
GRAYLOG_TRANSPORT_EMAIL_AUTH_PASSWORD: "XXXXXXXXXXXXXXXXXXXXXX"
GRAYLOG_TRANSPORT_EMAIL_FROM_EMAIL: "graylogfpalerts@formpipe.com"
ports:
- "5044:5044/tcp" # Beats
- "5140:5140/udp" # Syslog
- "5140:5140/tcp" # Syslog
- "5555:5555/tcp" # RAW TCP
- "5555:5555/udp" # RAW TCP
- "9000:9000/tcp" # Server API
- "12201:12201/tcp" # GELF TCP
- "12201:12201/udp" # GELF UDP
#- "10000:10000/tcp" # Custom TCP port
#- "10000:10000/udp" # Custom UDP port
- "13301:13301/tcp" # Forwarder data
- "13302:13302/tcp" # Forwarder config
- "514:514/tcp" # Syslog
restart: "on-failure"
oauth2-proxy:
image: quay.io/oauth2-proxy/oauth2-proxy
container_name: oauth2-proxy-new
volumes:
- "/home/azureuser/graylog/certs:/home/azureuser/graylog/certs"
restart: always
# networks:
# - proxy
command:
# - --upstream
# - http://127.0.0.1:9000/
- --reverse-proxy
- "true"
- --skip-provider-button
- "false"
- --skip-auth-route
- "/api"
environment:
- OAUTH2_PROXY_COOKIE_SECRET=XXXXXXXXXXXXXXXXXXXXXX
- OAUTH2_PROXY_CLIENT_ID=XXXXXXXXXXXXXXXXXXXXXX
- OAUTH2_PROXY_CLIENT_SECRET=XXXXXXXXXXXXXXXXXXXXXX
- OAUTH2_PROXY_PROVIDER=oidc
- OAUTH2_PROXY_AZURE_TENANT=XXXXXXXXXXXXXXXXXXXXXX
- OAUTH2_PROXY_OIDC_ISSUER_URL=https://login.microsoftonline.com/XXXXXXXXXXXXXXXXXXXXXX/v2.0
- OAUTH2_PROXY_EMAIL_DOMAINS=*
- OAUTH2_PROXY_REDIRECT_URL=https://XXXXXXXXXXXXXXXXXXXXXX:8080/oauth2/callback
# - OAUTH2_PROXY_HTTP_ADDRESS=http://0.0.0.0:4180
- OAUTH2_PROXY_HTTPS_ADDRESS=https://0.0.0.0:8080
- OAUTH2_PROXY_SESSION_STORE_TYPE=cookie
- OAUTH2_PROXY_COOKIE_SAMESITE=lax
- OAUTH2_PROXY_REVERSE_PROXY=true
# - OAUTH2_PROXY_COOKIE_CSRF_PER_REQUEST=true
# - OAUTH2_PROXY_COOKIE_CSRF_EXPIRE=5m
- OAUTH2_PROXY_SKIP_PROVIDER_BUTTON=false
- OAUTH2_PROXY_PASS_USER_HEADERS=false
# - OAUTH2_PROXY_SET_XAUTHREQUEST=true
- OAUTH2_PROXY_TLS_CERT_FILE=/home/azureuser/graylog/certs/cert.pem
- OAUTH2_PROXY_TLS_KEY_FILE=/home/azureuser/graylog/certs/privkey.key
- OAUTH2_PROXY_UPSTREAMS=http://server:9000/
ports:
- 4180:4180
- 8080:8080
volumes:
mongodb_data:
os_data:
graylog_data:
graylog_journal:
rosscerts:
graylog-1 | 2024-09-16 05:27:44,695 ERROR: org.graylog2.plugin.inputs.transports.AbstractTcpTransport - Error in Input [Beats/Input WinLog/664b60ef0819a622895a4e90] (channel [id: 0xf65c028d, L:/172.18.0.5:5044 ! R:/91.238.181.21:65146]) (cause io.netty.handler.codec.DecoderException: java.lang.IllegalStateException: Unknown beats protocol version: 3)
graylog-1 | 2024-09-16 05:27:44,699 ERROR: org.graylog2.plugin.inputs.transports.AbstractTcpTransport - Error in Input [Beats/Input WinLog/664b60ef0819a622895a4e90] (channel [id: 0xf65c028d, L:/172.18.0.5:5044 ! R:/91.238.181.21:65146]) (cause io.netty.handler.codec.DecoderException: java.lang.IllegalStateException: Unknown beats protocol version: 0)
graylog-1 | 2024-09-16 20:45:10,830 WARN : org.graylog2.cluster.nodes.AbstractNodeService - Did not find meta info of this node. Re-registering.
graylog-1 | 2024-09-16 20:45:11,920 INFO : org.graylog2.commands.Server - SIGNAL received. Shutting down.
graylog-1 | 2024-09-16 20:45:11,958 INFO : org.graylog2.system.shutdown.GracefulShutdown - Graceful shutdown initiated.
graylog-1 | 2024-09-16 20:45:11,971 INFO : org.graylog2.system.shutdown.GracefulShutdown - Node status: [Override lb:DEAD [LB:DEAD]]. Waiting <3sec> for possible load balancers to recognize state change.
graylog-1 | 2024-09-16 20:45:12,918 ERROR: org.graylog2.storage.versionprobe.VersionProbe - Unable to retrieve version from Elasticsearch node: Failed to connect to opensearch/172.18.0.4:9200. - Connection refused.
graylog-1 | 2024-09-16 20:45:12,984 INFO : org.graylog2.storage.versionprobe.VersionProbe - OpenSearch/Elasticsearch is not available. Retry #1
graylog-1 | 2024-09-16 20:45:12,967 INFO : org.mongodb.driver.cluster - Exception in monitor thread while connecting to server mongodb:27017
graylog-1 | com.mongodb.MongoNodeIsRecoveringException: Command failed with error 11600 (InterruptedAtShutdown): 'interrupted at shutdown' on server mongodb:27017. The full response is {"ok": 0.0, "errmsg": "interrupted at shutdown", "code": 11600, "codeName": "InterruptedAtShutdown"}
graylog-1 | at com.mongodb.internal.connection.ProtocolHelper.createSpecialException(ProtocolHelper.java:251) ~[graylog.jar:?]
graylog-1 | at com.mongodb.internal.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:201) ~[graylog.jar:?]
graylog-1 | at com.mongodb.internal.connection.InternalStreamConnection.receiveCommandMessageResponse(InternalStreamConnection.java:431) ~[graylog.jar:?]
graylog-1 | at com.mongodb.internal.connection.InternalStreamConnection.receive(InternalStreamConnection.java:381) ~[graylog.jar:?]
graylog-1 | at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.lookupServerDescription(DefaultServerMonitor.java:221) [graylog.jar:?]
graylog-1 | at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:153) [graylog.jar:?]
graylog-1 | at java.base/java.lang.Thread.run(Unknown Source) [?:?]
graylog-1 | 2024-09-16 20:45:13,162 INFO : org.mongodb.driver.cluster - Waiting for server to become available for operation with ID 6253292. Remaining time: 30000 ms. Selector: ReadPreferenceServerSelector{readPreference=primary}, topology description: {type=UNKNOWN, servers=[{address=mongodb:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoNodeIsRecoveringException: Command failed with error 11600 (InterruptedAtShutdown): 'interrupted at shutdown' on server mongodb:27017. The full response is {"ok": 0.0, "errmsg": "interrupted at shutdown", "code": 11600, "codeName": "InterruptedAtShutdown"}}}].
graylog-1 | 2024-09-16 20:45:13,164 INFO : org.mongodb.driver.cluster - Waiting for server to become available for operation with ID 6253290. Remaining time: 30000 ms. Selector: ReadPreferenceServerSelector{readPreference=primary}, topology description: {type=UNKNOWN, servers=[{address=mongodb:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoNodeIsRecoveringException: Command failed with error 11600 (InterruptedAtShutdown): 'interrupted at shutdown' on server mongodb:27017. The full response is {"ok": 0.0, "errmsg": "interrupted at shutdown", "code": 11600, "codeName": "InterruptedAtShutdown"}}}].
graylog-1 | 2024-09-16 20:45:13,178 INFO : org.mongodb.driver.cluster - Waiting for server to become available for operation with ID 6253291. Remaining time: 30000 ms. Selector: WritableServerSelector, topology description: {type=UNKNOWN, servers=[{address=mongodb:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoNodeIsRecoveringException: Command failed with error 11600 (InterruptedAtShutdown): 'interrupted at shutdown' on server mongodb:27017. The full response is {"ok": 0.0, "errmsg": "interrupted at shutdown", "code": 11600, "codeName": "InterruptedAtShutdown"}}}].
graylog-1 | 2024-09-16 20:45:13,165 INFO : org.mongodb.driver.cluster - Waiting for server to become available for operation with ID 6253289. Remaining time: 30000 ms. Selector: ReadPreferenceServerSelector{readPreference=primary}, topology description: {type=UNKNOWN, servers=[{address=mongodb:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoNodeIsRecoveringException: Command failed with error 11600 (InterruptedAtShutdown): 'interrupted at shutdown' on server mongodb:27017. The full response is {"ok": 0.0, "errmsg": "interrupted at shutdown", "code": 11600, "codeName": "InterruptedAtShutdown"}}}].
graylog-1 | 2024-09-16 20:45:13,179 INFO : org.mongodb.driver.cluster - Waiting for server to become available for operation with ID 6253284. Remaining time: 30000 ms. Selector: ReadPreferenceServerSelector{readPreference=primary}, topology description: {type=UNKNOWN, servers=[{address=mongodb:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoNodeIsRecoveringException: Command failed with error 11600 (InterruptedAtShutdown): 'interrupted at shutdown' on server mongodb:27017. The full response is {"ok": 0.0, "errmsg": "interrupted at shutdown", "code": 11600, "codeName": "InterruptedAtShutdown"}}}].
3. What steps have you already taken to try and solve the problem?
We increased the machine size in azure for more cores and memory, it seemed to alleviate the problem longer but still eventually crashed. We should be sending messages over HTTPS so I enabled TLS on the beats input but this did not make a difference.
4. How can the community help?
Any obvious configuration error here?
Many Thanks!
Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]