Graylog 6.2 Docker Container Health Check Failing

Hello, friends!

I’m having some trouble with the Graylog Docker environment.

The container keeps exiting:

graylog_graylog-datanode.1   graylog/graylog-datanode:6.2.2@sha256:175fb39e4b03eba708c189c01d34a1c7c20a33c5f525df962ca521276289191e   azubuntu   Running         Running 6 minutes ago                       
graylog_graylog-mongodb.1    mongo:7.0.15@sha256:8958f43a940142d2a4572bb4d4f147ab26994a3da1cead6d39876b96451a4f1a                     azubuntu   Running         Running 6 minutes ago                       
graylog_graylog.1            graylog/graylog:6.2.2@sha256:62560590c1c74630bb2ff84867d2bf603db8aeb8c91fbcc9da3355b2f24ab3f3            azubuntu   Running         Starting 16 seconds ago                     
    \_ graylog_graylog.1        graylog/graylog:6.2.2@sha256:62560590c1c74630bb2ff84867d2bf603db8aeb8c91fbcc9da3355b2f24ab3f3            azubuntu   Shutdown        Failed 21 seconds ago     "task: non-zero exit (143): dockerexec: unhealthy container"
    \_ graylog_graylog.1        graylog/graylog:6.2.2@sha256:62560590c1c74630bb2ff84867d2bf603db8aeb8c91fbcc9da3355b2f24ab3f3            azubuntu   Shutdown        Failed 2 minutes ago      "task: non-zero exit (143): dockerexec: unhealthy container"
    \_ graylog_graylog.1        graylog/graylog:6.2.2@sha256:62560590c1c74630bb2ff84867d2bf603db8aeb8c91fbcc9da3355b2f24ab3f3            azubuntu   Shutdown        Failed 4 minutes ago      "task: non-zero exit (143): dockerexec: unhealthy container

I’ve edited “health_check.sh” to see at which check it fails, resulting in:

=== Graylog Healthcheck Script Debug ===
[info] Default proto: http
[info] Default bind address: 127.0.0.1:9000
[info] Found config file at /usr/share/graylog/data/config/graylog.conf
[config] http_publish_uri from config: ''
[config] http_bind_address from config: '0.0.0.0:9000'
[config] http_enable_tls from config: ''
[env] GRAYLOG_HTTP_PUBLISH_URI: http://graylog:9002
[debug] Overridden http_publish_uri: graylog:9002
[debug] Overridden http_bind_address: 0.0.0.0:9002
[check] Using check_url: http://graylog:9002
[curl] Checking http://graylog:9002/api
*   Trying 10.0.2.3:9002...
* Connected to graylog (10.0.2.3) port 9002 (#0)
> GET / HTTP/1.1
> Host: graylog:9002
> User-Agent: curl/7.81.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 401 Unauthorized
< WWW-Authenticate: Basic realm=preflight-config
< Content-Type: text/plain
< Content-Length: 62
* The requested URL returned error: 401
* Closing connection 0
[curl] Primary check failed. Trying fallback: http://127.0.0.1/api
*   Trying 127.0.0.1:80...
* connect to 127.0.0.1 port 80 failed: Connection refused
* Failed to connect to 127.0.0.1 port 80 after 0 ms: Connection refused
* Closing connection 0
[error] All health checks failed. Exiting with code 1

By the script output, combined with other curl tests, I believe that there is nothing wrong with setup, as it is possible to see that it is even requesting authorization and that the socket is opened and listening:

# 9002 opened, requesting auth:
#
graylog@graylog:~$ curl --verbose a:a@graylog:9002/
*   Trying 10.0.1.107:9002...
* Connected to graylog (10.0.1.107) port 9002 (#0)
* Server auth using Basic with user 'a'
> GET / HTTP/1.1
> Host: graylog:9002
> Authorization: Basic YTph
> User-Agent: curl/7.81.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 401 Unauthorized
* Authentication problem. Ignoring this.
< WWW-Authenticate: Basic realm=preflight-config
< Content-Type: text/plain
< Content-Length: 71
<
* Connection #0 to host graylog left intact
You cannot access this resource, invalid username/password combination!

#
# 9000 Closed, as it should per configuration:
#

graylog@graylog:~$ curl --verbose a:a@graylog:9000/
*   Trying 10.0.1.107:9000...
* connect to 10.0.1.107 port 9000 failed: Connection refused
*   Trying 10.0.2.14:9000...
* connect to 10.0.2.14 port 9000 failed: Connection refused
* Failed to connect to graylog port 9000 after 1 ms: Connection refused
* Closing connection 0
curl: (7) Failed to connect to graylog port 9000 after 1 ms: Connection refused

In resume, the health check keeps failing and i don’t know why. I’d be grateful for any inputs that may help me. Thanks in advance!


Relevant config files:

graylog-compose.yml:

networks:
  graylog_net:
    driver: overlay
    internal: true
    attachable: true

  traefik3_net:
    external: true

volumes:
  graylog_data:
    driver: local
  graylog_journal:
    driver: local

services:
  graylog:
    image: graylog/graylog:6.2.2
    hostname: "graylog"
    env_file: ".env_graylog"

    volumes:
      - "graylog_data:/usr/share/graylog/data/data"
      - "graylog_journal:/usr/share/graylog/data/journal"

    networks:
      - graylog_net
      - traefik3_net

    depends_on:
      - graylog-mongodb
      - graylog-datanode

    deploy:
      resources:
        limits:
          cpus: '0.50'
          memory: 1024M
      mode: replicated
      replicas: 1

    labels:
      #redacted#

env file:

GRAYLOG_PASSWORD_SECRET=#redacted#
GRAYLOG_ROOT_PASSWORD_SHA2=#redacted#
GRAYLOG_NODE_ID_FILE=/usr/share/graylog/data/data/node-id
GRAYLOG_HTTP_BIND_ADDRESS=0.0.0.0:9002
GRAYLOG_HTTP_PUBLISH_URI=http://graylog:9002
GRAYLOG_HTTP_EXTERNAL_URI=http://graylog:9002/
GRAYLOG_MONGODB_URI=mongodb://#redacted#:#redacted#@graylog-mongodb:27017/graylog?authSource=users

If anyone comes across this post:

This is a problem that occurs in Docker Swarm, probably only when combined with Traefik.

The resource $http_publish_uri/api will only be available after the initial setup. Before that, it will always return a 404.

If something goes wrong with curl (e.g., HTTP code 404), it returns a non-zero exit code. Using the -f flag causes the healthcheck to exit with 1, making the container unhealthy. As a result, Traefik will not add a route for the service, and the Graylog container will keep exiting.

I guess the palliative solution would be to override the healthcheck with something like:

healthcheck:
  test: ["CMD-SHELL", "status=$$(curl -s -o /dev/null -w '%{http_code}' http://graylog:9002/api);
         if [ \"$$status\" = \"200\" ] ||
            { [ \"$$status\" = \"404\" ] &&
              [ \"$$(curl -s -o /dev/null -w '%{http_code}' http://graylog:9002)\" = \"401\" ]; }; then
              exit 0;
         else
              echo \"Healthcheck failed: Graylog not responding\";
              exit 1; fi"]
  interval: 10s
  timeout: 2s
  retries: 12
  start_period: 10s
  start_interval: 5s

or bind mount a custom health_check.sh script.

There are some solutions available online for deploying Graylog with Swarm, but I haven’t had the time to test them yet.

Also, when using Traefik and the Graylog service is attached to two interfaces, make sure to specify:


traefik.docker.network: traefik3_net
1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.