Graylog crash and mongodb exception

Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!

1. Describe your incident:
Graylog container will get in an ‘unhealthy’ state, can sometimes crash after a day or a week. Logs point to interrupted shutdown then fails to connect to opensearch and mongodb.

2. Describe your environment:

  • OS Information:
    Linux (ubuntu 22.04)
    Standard D2s v3 Azure VM

  • Package Version:
    Mongodb 6.0
    Opensearch 2.4.0
    Garylog 6.0.1

  • Service logs, configurations, and environment variables:
    Yaml file and graylog container log below covering time of crash.

version: "3.8"

services:
  mongodb:
    image: "mongo:6.0"
    volumes:
      - "mongodb_data:/data/db"
    restart: "on-failure"

  opensearch:
    image: "opensearchproject/opensearch:2.4.0"
    environment:
      - "OPENSEARCH_JAVA_OPTS=-Xms1g -Xmx1g"
      - "bootstrap.memory_lock=true"
      - "discovery.type=single-node"
      - "action.auto_create_index=false"
      - "plugins.security.ssl.http.enabled=false"
      - "plugins.security.disabled=true"
    ulimits:
      memlock:
        hard: -1
        soft: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - "os_data:/usr/share/opensearch/data"
    restart: "on-failure"

  graylog:
    hostname: "server"
    image: "${GRAYLOG_IMAGE:-graylog/graylog:6.0.1}"
    volumes:
      - "graylog_data:/usr/share/graylog/data/data"
      - "graylog_journal:/usr/share/graylog/data/journal"
      - "/home/azureuser/graylog/certs:/home/azureuser/graylog/certs"
    depends_on:
      opensearch:
        condition: "service_started"
      mongodb:
        condition: "service_started"
    entrypoint: "/usr/bin/tini -- wait-for-it opensearch:9200 --  /docker-entrypoint.sh"
    environment:
      GRAYLOG_NODE_ID_FILE: "/usr/share/graylog/data/config/node-id"
      GRAYLOG_PASSWORD_SECRET: "${GRAYLOG_PASSWORD_SECRET:?Please configure GRAYLOG_PASSWORD_SECRET in the .env file}"
      GRAYLOG_ROOT_PASSWORD_SHA2: "${GRAYLOG_ROOT_PASSWORD_SHA2:?Please configure GRAYLOG_ROOT_PASSWORD_SHA2 in the .env file}"
      GRAYLOG_HTTP_BIND_ADDRESS: "0.0.0.0:9000"
      # GRAYLOG_HTTP_EXTERNAL_URI: "http://0.0.0.0:9000/"
      GRAYLOG_ELASTICSEARCH_HOSTS: "http://opensearch:9200"
      GRAYLOG_MONGODB_URI: "mongodb://mongodb:27017/graylog"
      GRAYLOG_HTTP_ENABLE_TLS: "false"
      GRAYLOG_HTTP_TLS_CERT_FILE: "/home/azureuser/graylog/certs/cert.pem"
      GRAYLOG_HTTP_TLS_KEY_FILE: "/home/azureuser/graylog/certs/privkey.pem"
#      GRAYLOG_HTTP_TLS_KEY_PASSWORD: "secret"
      GRAYLOG_TRANSPORT_EMAIL_ENABLED: true
      GRAYLOG_TRANSPORT_EMAIL_HOSTNAME: "smtp.sendgrid.net"
      GRAYLOG_TRANSPORT_EMAIL_PORT: "587"
      GRAYLOG_TRANSPORT_EMAIL_USE_AUTH: "true"
      GRAYLOG_TRANSPORT_EMAIL_AUTH_USERNAME: "apikey"
      GRAYLOG_TRANSPORT_EMAIL_AUTH_PASSWORD: "XXXXXXXXXXXXXXXXXXXXXX"
      GRAYLOG_TRANSPORT_EMAIL_FROM_EMAIL: "graylogfpalerts@formpipe.com"
    ports:
      - "5044:5044/tcp"   # Beats
      - "5140:5140/udp"   # Syslog
      - "5140:5140/tcp"   # Syslog
      - "5555:5555/tcp"   # RAW TCP
      - "5555:5555/udp"   # RAW TCP
      - "9000:9000/tcp"   # Server API
      - "12201:12201/tcp" # GELF TCP
      - "12201:12201/udp" # GELF UDP
      #- "10000:10000/tcp" # Custom TCP port
      #- "10000:10000/udp" # Custom UDP port
      - "13301:13301/tcp" # Forwarder data
      - "13302:13302/tcp" # Forwarder config
      - "514:514/tcp"   # Syslog
    restart: "on-failure"

  oauth2-proxy:
    image: quay.io/oauth2-proxy/oauth2-proxy
    container_name: oauth2-proxy-new
    volumes:
      - "/home/azureuser/graylog/certs:/home/azureuser/graylog/certs"
    restart: always
#    networks:
#      - proxy
    command:
#      - --upstream
#      - http://127.0.0.1:9000/
      - --reverse-proxy
      - "true"
      - --skip-provider-button
      - "false"
      - --skip-auth-route
      - "/api"
    environment:
      - OAUTH2_PROXY_COOKIE_SECRET=XXXXXXXXXXXXXXXXXXXXXX
      - OAUTH2_PROXY_CLIENT_ID=XXXXXXXXXXXXXXXXXXXXXX
      - OAUTH2_PROXY_CLIENT_SECRET=XXXXXXXXXXXXXXXXXXXXXX
      - OAUTH2_PROXY_PROVIDER=oidc
      - OAUTH2_PROXY_AZURE_TENANT=XXXXXXXXXXXXXXXXXXXXXX
      - OAUTH2_PROXY_OIDC_ISSUER_URL=https://login.microsoftonline.com/XXXXXXXXXXXXXXXXXXXXXX/v2.0
      - OAUTH2_PROXY_EMAIL_DOMAINS=*
      - OAUTH2_PROXY_REDIRECT_URL=https://XXXXXXXXXXXXXXXXXXXXXX:8080/oauth2/callback
#      - OAUTH2_PROXY_HTTP_ADDRESS=http://0.0.0.0:4180
      - OAUTH2_PROXY_HTTPS_ADDRESS=https://0.0.0.0:8080
      - OAUTH2_PROXY_SESSION_STORE_TYPE=cookie
      - OAUTH2_PROXY_COOKIE_SAMESITE=lax
      - OAUTH2_PROXY_REVERSE_PROXY=true
 #     - OAUTH2_PROXY_COOKIE_CSRF_PER_REQUEST=true
 #     - OAUTH2_PROXY_COOKIE_CSRF_EXPIRE=5m
      - OAUTH2_PROXY_SKIP_PROVIDER_BUTTON=false
      - OAUTH2_PROXY_PASS_USER_HEADERS=false
#      - OAUTH2_PROXY_SET_XAUTHREQUEST=true
      - OAUTH2_PROXY_TLS_CERT_FILE=/home/azureuser/graylog/certs/cert.pem
      - OAUTH2_PROXY_TLS_KEY_FILE=/home/azureuser/graylog/certs/privkey.key
      - OAUTH2_PROXY_UPSTREAMS=http://server:9000/
    ports:
      - 4180:4180
      - 8080:8080

volumes:
  mongodb_data:
  os_data:
  graylog_data:
  graylog_journal:
  rosscerts:
graylog-1  | 2024-09-16 05:27:44,695 ERROR: org.graylog2.plugin.inputs.transports.AbstractTcpTransport - Error in Input [Beats/Input WinLog/664b60ef0819a622895a4e90] (channel [id: 0xf65c028d, L:/172.18.0.5:5044 ! R:/91.238.181.21:65146]) (cause io.netty.handler.codec.DecoderException: java.lang.IllegalStateException: Unknown beats protocol version: 3)
graylog-1  | 2024-09-16 05:27:44,699 ERROR: org.graylog2.plugin.inputs.transports.AbstractTcpTransport - Error in Input [Beats/Input WinLog/664b60ef0819a622895a4e90] (channel [id: 0xf65c028d, L:/172.18.0.5:5044 ! R:/91.238.181.21:65146]) (cause io.netty.handler.codec.DecoderException: java.lang.IllegalStateException: Unknown beats protocol version: 0)
graylog-1  | 2024-09-16 20:45:10,830 WARN : org.graylog2.cluster.nodes.AbstractNodeService - Did not find meta info of this node. Re-registering.
graylog-1  | 2024-09-16 20:45:11,920 INFO : org.graylog2.commands.Server - SIGNAL received. Shutting down.
graylog-1  | 2024-09-16 20:45:11,958 INFO : org.graylog2.system.shutdown.GracefulShutdown - Graceful shutdown initiated.
graylog-1  | 2024-09-16 20:45:11,971 INFO : org.graylog2.system.shutdown.GracefulShutdown - Node status: [Override lb:DEAD [LB:DEAD]]. Waiting <3sec> for possible load balancers to recognize state change.
graylog-1  | 2024-09-16 20:45:12,918 ERROR: org.graylog2.storage.versionprobe.VersionProbe - Unable to retrieve version from Elasticsearch node: Failed to connect to opensearch/172.18.0.4:9200. - Connection refused.
graylog-1  | 2024-09-16 20:45:12,984 INFO : org.graylog2.storage.versionprobe.VersionProbe - OpenSearch/Elasticsearch is not available. Retry #1
graylog-1  | 2024-09-16 20:45:12,967 INFO : org.mongodb.driver.cluster - Exception in monitor thread while connecting to server mongodb:27017
graylog-1  | com.mongodb.MongoNodeIsRecoveringException: Command failed with error 11600 (InterruptedAtShutdown): 'interrupted at shutdown' on server mongodb:27017. The full response is {"ok": 0.0, "errmsg": "interrupted at shutdown", "code": 11600, "codeName": "InterruptedAtShutdown"}
graylog-1  |    at com.mongodb.internal.connection.ProtocolHelper.createSpecialException(ProtocolHelper.java:251) ~[graylog.jar:?]
graylog-1  |    at com.mongodb.internal.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:201) ~[graylog.jar:?]
graylog-1  |    at com.mongodb.internal.connection.InternalStreamConnection.receiveCommandMessageResponse(InternalStreamConnection.java:431) ~[graylog.jar:?]
graylog-1  |    at com.mongodb.internal.connection.InternalStreamConnection.receive(InternalStreamConnection.java:381) ~[graylog.jar:?]
graylog-1  |    at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.lookupServerDescription(DefaultServerMonitor.java:221) [graylog.jar:?]
graylog-1  |    at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:153) [graylog.jar:?]
graylog-1  |    at java.base/java.lang.Thread.run(Unknown Source) [?:?]
graylog-1  | 2024-09-16 20:45:13,162 INFO : org.mongodb.driver.cluster - Waiting for server to become available for operation with ID 6253292. Remaining time: 30000 ms. Selector: ReadPreferenceServerSelector{readPreference=primary}, topology description: {type=UNKNOWN, servers=[{address=mongodb:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoNodeIsRecoveringException: Command failed with error 11600 (InterruptedAtShutdown): 'interrupted at shutdown' on server mongodb:27017. The full response is {"ok": 0.0, "errmsg": "interrupted at shutdown", "code": 11600, "codeName": "InterruptedAtShutdown"}}}].
graylog-1  | 2024-09-16 20:45:13,164 INFO : org.mongodb.driver.cluster - Waiting for server to become available for operation with ID 6253290. Remaining time: 30000 ms. Selector: ReadPreferenceServerSelector{readPreference=primary}, topology description: {type=UNKNOWN, servers=[{address=mongodb:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoNodeIsRecoveringException: Command failed with error 11600 (InterruptedAtShutdown): 'interrupted at shutdown' on server mongodb:27017. The full response is {"ok": 0.0, "errmsg": "interrupted at shutdown", "code": 11600, "codeName": "InterruptedAtShutdown"}}}].
graylog-1  | 2024-09-16 20:45:13,178 INFO : org.mongodb.driver.cluster - Waiting for server to become available for operation with ID 6253291. Remaining time: 30000 ms. Selector: WritableServerSelector, topology description: {type=UNKNOWN, servers=[{address=mongodb:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoNodeIsRecoveringException: Command failed with error 11600 (InterruptedAtShutdown): 'interrupted at shutdown' on server mongodb:27017. The full response is {"ok": 0.0, "errmsg": "interrupted at shutdown", "code": 11600, "codeName": "InterruptedAtShutdown"}}}].
graylog-1  | 2024-09-16 20:45:13,165 INFO : org.mongodb.driver.cluster - Waiting for server to become available for operation with ID 6253289. Remaining time: 30000 ms. Selector: ReadPreferenceServerSelector{readPreference=primary}, topology description: {type=UNKNOWN, servers=[{address=mongodb:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoNodeIsRecoveringException: Command failed with error 11600 (InterruptedAtShutdown): 'interrupted at shutdown' on server mongodb:27017. The full response is {"ok": 0.0, "errmsg": "interrupted at shutdown", "code": 11600, "codeName": "InterruptedAtShutdown"}}}].
graylog-1  | 2024-09-16 20:45:13,179 INFO : org.mongodb.driver.cluster - Waiting for server to become available for operation with ID 6253284. Remaining time: 30000 ms. Selector: ReadPreferenceServerSelector{readPreference=primary}, topology description: {type=UNKNOWN, servers=[{address=mongodb:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoNodeIsRecoveringException: Command failed with error 11600 (InterruptedAtShutdown): 'interrupted at shutdown' on server mongodb:27017. The full response is {"ok": 0.0, "errmsg": "interrupted at shutdown", "code": 11600, "codeName": "InterruptedAtShutdown"}}}].

3. What steps have you already taken to try and solve the problem?
We increased the machine size in azure for more cores and memory, it seemed to alleviate the problem longer but still eventually crashed. We should be sending messages over HTTPS so I enabled TLS on the beats input but this did not make a difference.

4. How can the community help?
Any obvious configuration error here?
Many Thanks!

Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]

Hello @glover2409,

How much data were you pushing through the cluster per day?

It could be an issue with resource contention, I can see that you are starting Opensearch with only 1GB of heap memory assigned which would be an issue.

The Mongo and Opensearch logs will help paint the picture of what is occurring.

Hi @Wine_Merchant many thanks for the response!
I increased the OpenSearch memory to 4GB it still died after a while, logs from mongodb and opensearch during failure don’t give much information.

Mongodb

mongodb-1  | {"t":{"$date":"2024-09-19T20:37:31.156+00:00"},"s":"I",  "c":"WTRECOV",  "id":22430,   "ctx":"SignalHandler","msg":"WiredTiger message","attr":{"message":{"ts_sec":1726778251,"ts_usec":156645,"thread":"1:0x7903d4e00640","session_name":"txn-recover","category":"WT_VERB_RECOVERY_PROGRESS","category_id":30,"verbose_level":"DEBUG","verbose_level_id":1,"msg":"Set global recovery timestamp: (0, 0)"}}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:31.156+00:00"},"s":"I",  "c":"WTRECOV",  "id":22430,   "ctx":"SignalHandler","msg":"WiredTiger message","attr":{"message":{"ts_sec":1726778251,"ts_usec":156684,"thread":"1:0x7903d4e00640","session_name":"txn-recover","category":"WT_VERB_RECOVERY_PROGRESS","category_id":30,"verbose_level":"DEBUG","verbose_level_id":1,"msg":"Set global oldest timestamp: (0, 0)"}}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:31.191+00:00"},"s":"I",  "c":"WTRECOV",  "id":22430,   "ctx":"SignalHandler","msg":"WiredTiger message","attr":{"message":{"ts_sec":1726778251,"ts_usec":191564,"thread":"1:0x7903d4e00640","session_name":"txn-recover","category":"WT_VERB_RECOVERY_PROGRESS","category_id":30,"verbose_level":"DEBUG","verbose_level_id":1,"msg":"recovery rollback to stable has successfully finished and ran for 34 milliseconds"}}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:31.483+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn5","msg":"Connection ended","attr":{"remote":"172.18.0.5:40378","uuid":"1be226a9-fdf5-4d9f-a8b8-3fd3c31d7659","connectionId":5,"connectionCount":0}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:31.661+00:00"},"s":"I",  "c":"WTCHKPT",  "id":22430,   "ctx":"SignalHandler","msg":"WiredTiger message","attr":{"message":{"ts_sec":1726778251,"ts_usec":661802,"thread":"1:0x7903d4e00640","session_name":"WT_SESSION.checkpoint","category":"WT_VERB_CHECKPOINT_PROGRESS","category_id":6,"verbose_level":"DEBUG","verbose_level_id":1,"msg":"saving checkpoint snapshot min: 1, snapshot max: 1 snapshot count: 0, oldest timestamp: (0, 0) , meta checkpoint timestamp: (0, 0) base write gen: 979465"}}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.490+00:00"},"s":"I",  "c":"WTRECOV",  "id":22430,   "ctx":"SignalHandler","msg":"WiredTiger message","attr":{"message":{"ts_sec":1726778252,"ts_usec":490850,"thread":"1:0x7903d4e00640","session_name":"txn-recover","category":"WT_VERB_RECOVERY_PROGRESS","category_id":30,"verbose_level":"DEBUG","verbose_level_id":1,"msg":"recovery checkpoint has successfully finished and ran for 1298 milliseconds"}}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.491+00:00"},"s":"I",  "c":"WTRECOV",  "id":22430,   "ctx":"SignalHandler","msg":"WiredTiger message","attr":{"message":{"ts_sec":1726778252,"ts_usec":491313,"thread":"1:0x7903d4e00640","session_name":"txn-recover","category":"WT_VERB_RECOVERY_PROGRESS","category_id":30,"verbose_level":"DEBUG","verbose_level_id":1,"msg":"recovery was completed successfully and took 2683ms, including 1349ms for the log replay, 34ms for the rollback to stable, and 1298ms for the checkpoint."}}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.494+00:00"},"s":"I",  "c":"STORAGE",  "id":4795904, "ctx":"SignalHandler","msg":"WiredTiger re-opened","attr":{"durationMillis":2758}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.497+00:00"},"s":"I",  "c":"STORAGE",  "id":22325,   "ctx":"SignalHandler","msg":"Reconfiguring","attr":{"newConfig":"compatibility=(release=10.0)"}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.498+00:00"},"s":"I",  "c":"STORAGE",  "id":4795903, "ctx":"SignalHandler","msg":"Reconfigure complete","attr":{"durationMillis":1}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.499+00:00"},"s":"I",  "c":"STORAGE",  "id":4795902, "ctx":"SignalHandler","msg":"Closing WiredTiger","attr":{"closeConfig":"leak_memory=true,"}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.500+00:00"},"s":"I",  "c":"WTCHKPT",  "id":22430,   "ctx":"SignalHandler","msg":"WiredTiger message","attr":{"message":{"ts_sec":1726778252,"ts_usec":500348,"thread":"1:0x7903d4e00640","session_name":"close_ckpt","category":"WT_VERB_CHECKPOINT_PROGRESS","category_id":6,"verbose_level":"DEBUG","verbose_level_id":1,"msg":"saving checkpoint snapshot min: 2, snapshot max: 2 snapshot count: 0, oldest timestamp: (0, 0) , meta checkpoint timestamp: (0, 0) base write gen: 979465"}}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.576+00:00"},"s":"I",  "c":"WTRECOV",  "id":22430,   "ctx":"SignalHandler","msg":"WiredTiger message","attr":{"message":{"ts_sec":1726778252,"ts_usec":576312,"thread":"1:0x7903d4e00640","session_name":"WT_CONNECTION.close","category":"WT_VERB_RECOVERY_PROGRESS","category_id":30,"verbose_level":"DEBUG","verbose_level_id":1,"msg":"shutdown checkpoint has successfully finished and ran for 76 milliseconds"}}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.576+00:00"},"s":"I",  "c":"WTRECOV",  "id":22430,   "ctx":"SignalHandler","msg":"WiredTiger message","attr":{"message":{"ts_sec":1726778252,"ts_usec":576482,"thread":"1:0x7903d4e00640","session_name":"WT_CONNECTION.close","category":"WT_VERB_RECOVERY_PROGRESS","category_id":30,"verbose_level":"DEBUG","verbose_level_id":1,"msg":"shutdown was completed successfully and took 76ms, including 0ms for the rollback to stable, and 76ms for the checkpoint."}}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.606+00:00"},"s":"I",  "c":"STORAGE",  "id":4795901, "ctx":"SignalHandler","msg":"WiredTiger closed","attr":{"durationMillis":107}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.606+00:00"},"s":"I",  "c":"STORAGE",  "id":22279,   "ctx":"SignalHandler","msg":"shutdown: removing fs lock..."}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.606+00:00"},"s":"I",  "c":"-",        "id":4784931, "ctx":"SignalHandler","msg":"Dropping the scope cache for shutdown"}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.630+00:00"},"s":"I",  "c":"FTDC",     "id":20626,   "ctx":"SignalHandler","msg":"Shutting down full-time diagnostic data capture"}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.637+00:00"},"s":"I",  "c":"CONTROL",  "id":20565,   "ctx":"SignalHandler","msg":"Now exiting"}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.638+00:00"},"s":"I",  "c":"CONTROL",  "id":8423404, "ctx":"SignalHandler","msg":"mongod shutdown complete","attr":{"Summary of time elapsed":{"Statistics":{"Enter terminal shutdown":"0 ms","Step down the replication coordinator for shutdown":"46 ms","Time spent in quiesce mode":"0 ms","Shut down FLE Crud subsystem":"1 ms","Shut down MirrorMaestro":"0 ms","Shut down WaitForMajorityService":"10 ms","Shut down the logical session cache":"9 ms","Shut down the transport layer":"19 ms","Shut down the global connection pool":"3 ms","Shut down the flow control ticket holder":"0 ms","Kill all operations for shutdown":"2 ms","Shut down all tenant migration access blockers on global shutdown":"2 ms","Shut down all open transactions":"0 ms","Acquire the RSTL for shutdown":"0 ms","Shut down the IndexBuildsCoordinator and wait for index builds to finish":"0 ms","Shut down the replica set monitor":"2 ms","Shut down the migration util executor":"6 ms","Shut down the health log":"1 ms","Shut down the TTL monitor":"5 ms","Shut down expired pre-images remover":"0 ms","Shut down the storage engine":"3224 ms","Shut down full-time data capture":"10 ms","shutdownTask total elapsed time":"3376 ms"}}}}
mongodb-1  | {"t":{"$date":"2024-09-19T20:37:32.638+00:00"},"s":"I",  "c":"CONTROL",  "id":23138,   "ctx":"SignalHandler","msg":"Shutting down","attr":{"exitCode":0}}

Opensearch

opensearch-1  | [2024-09-19T19:17:51,141][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T19:22:51,141][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T19:27:51,142][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T19:32:51,142][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T19:32:51,392][INFO ][o.o.a.t.CronTransportAction] [8868c363e0f4] Start running AD hourly cron.
opensearch-1  | [2024-09-19T19:32:51,392][INFO ][o.o.a.t.ADTaskManager    ] [8868c363e0f4] Start to maintain running historical tasks
opensearch-1  | [2024-09-19T19:32:51,393][INFO ][o.o.a.c.HourlyCron       ] [8868c363e0f4] Hourly maintenance succeeds
opensearch-1  | [2024-09-19T19:37:51,142][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T19:42:51,143][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T19:47:51,143][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T19:52:51,144][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T19:57:51,144][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T20:02:51,144][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T20:07:51,145][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T20:12:51,145][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T20:17:51,145][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T20:22:51,146][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T20:27:51,146][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T20:32:51,147][INFO ][o.o.j.s.JobSweeper       ] [8868c363e0f4] Running full sweep
opensearch-1  | [2024-09-19T20:32:51,392][INFO ][o.o.a.t.CronTransportAction] [8868c363e0f4] Start running AD hourly cron.
opensearch-1  | [2024-09-19T20:32:51,393][INFO ][o.o.a.t.ADTaskManager    ] [8868c363e0f4] Start to maintain running historical tasks
opensearch-1  | [2024-09-19T20:32:51,393][INFO ][o.o.a.c.HourlyCron       ] [8868c363e0f4] Hourly maintenance succeeds
opensearch-1  | Killing opensearch process 101
opensearch-1  | [2024-09-19T20:37:29,283][INFO ][o.o.n.Node               ] [8868c363e0f4] stopping ...
opensearch-1  | [2024-09-19T20:37:29,839][INFO ][o.o.n.Node               ] [8868c363e0f4] stopped
opensearch-1  | [2024-09-19T20:37:29,841][INFO ][o.o.n.Node               ] [8868c363e0f4] closing ...
opensearch-1  | [2024-09-19T20:37:29,954][INFO ][o.o.n.Node               ] [8868c363e0f4] closed
opensearch-1  | Killing performance analyzer process 102
opensearch-1  | OpenSearch exited with code 143
opensearch-1  | Performance analyzer exited with code 143

@glover2409

I’m late in replying as I’ve been away, did you manage to fix this issue?

Hi @Wine_Merchant thanks for the response, we still have the issue, we are going to try something with our proxy to troubleshoot.
Thanks

@glover2409 The fact that the OS and Mongo services both shutdown at 20:37 is odd, what did the host resource utilisation look like at that time?

@Wine_Merchant Looks like we had about 900MB of spare memory around that time, our CPU was not even at 10% utilization. We temporarily looked to remove the proxy as we believed it was perhaps doing something with messages as we proxied them but Graylog still crashed