Graylog hangs in few hours after run (docker containers)

Graylog works fine (index incoming message, search, etc…) just after start, but after ~9-16 hours it stops to receive new messages and starts to log error KafkaJournal - Cannot write /usr/share/graylog/data/journal/graylog2-committed-read-offset to disk. every second.

I do not think that this is the permission issue. Graylog receives and parse about 100 Gb of data in day, 1k-2k messages/second. There are no errors on System/Overview page (Elastic cluster is green, no failed indexing attempts, no errors in System messages).

2018-12-19 04:16:42,906 INFO : org.apache.directory.api.ldap.codec.standalone.CodecFactoryUtil - Registered pre-bundled extended operation factory: 1.3.6.1.4.1.1466.20037
2018-12-19 04:17:59,513 ERROR: org.graylog2.shared.journal.KafkaJournal - Cannot write /usr/share/graylog/data/journal/graylog2-committed-read-offset to disk.
java.io.FileNotFoundException: /usr/share/graylog/data/journal/graylog2-committed-read-offset (Permission denied)
        at java.io.FileOutputStream.open0(Native Method) ~[?:1.8.0_181]
        at java.io.FileOutputStream.open(FileOutputStream.java:270) ~[?:1.8.0_181]
        at java.io.FileOutputStream.<init>(FileOutputStream.java:213) ~[?:1.8.0_181]
        at java.io.FileOutputStream.<init>(FileOutputStream.java:162) ~[?:1.8.0_181]
        at org.graylog2.shared.journal.KafkaJournal$OffsetFileFlusher.run(KafkaJournal.java:763) [graylog.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_181]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_181]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
2018-12-19 04:18:00,501 ERROR: org.graylog2.shared.journal.KafkaJournal - Cannot write /usr/share/graylog/data/journal/graylog2-committed-read-offset to disk.
java.io.FileNotFoundException: /usr/share/graylog/data/journal/graylog2-committed-read-offset (Permission denied)
        at java.io.FileOutputStream.open0(Native Method) ~[?:1.8.0_181]
        at java.io.FileOutputStream.open(FileOutputStream.java:270) ~[?:1.8.0_181]
        at java.io.FileOutputStream.<init>(FileOutputStream.java:213) ~[?:1.8.0_181]
        at java.io.FileOutputStream.<init>(FileOutputStream.java:162) ~[?:1.8.0_181]
        at org.graylog2.shared.journal.KafkaJournal$OffsetFileFlusher.run(KafkaJournal.java:763) [graylog.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_181]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_181]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

docker-compose.yml

version: '3.7'
networks:
  graylog:

services:
  graylog-mongodb:
    image: mongo:3
    container_name: graylog-mongodb
    networks:
      - graylog
    volumes:
      - mongo_data:/data/db
  graylog-elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.5.3
    container_name: graylog-elasticsearch
    networks:
      - graylog
    volumes:
      - es_data_graylog:/usr/share/elasticsearch/data
    environment:
      - http.host=0.0.0.0
      - transport.host=localhost
      - network.host=0.0.0.0
      - xpack.security.enabled=false
      - xpack.watcher.enabled=false
      - xpack.monitoring.enabled=false
      - xpack.security.audit.enabled=false
      - xpack.ml.enabled=false
      - xpack.graph.enabled=false
      - "ES_JAVA_OPTS=-Xms26g -Xmx26g"
    ulimits:
      memlock:
        soft: -1
        hard: -1
  graylog:
    image: graylog/graylog:2.5
    container_name: graylog
    volumes:
      - graylog_journal:/usr/share/graylog/data/journal
      - ./graylog/config:/usr/share/graylog/data/config
    networks:
      - graylog
    depends_on:
      - graylog-mongodb
      - graylog-elasticsearch
    ports:
      - 514:514
      - 514:514/udp
      - 20101:20101/udp
      - 20103:20103/udp
      - 20151:20151/udp
volumes:
  mongo_data:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /mnt/sas/data/graylog/mongo
  es_data_graylog:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /mnt/sas/data/graylog/elastic
  graylog_journal:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /mnt/sas/data/graylog/graylog_journal

I guess you have two issues

2018-12-19 04:18:00,501 ERROR: org.graylog2.shared.journal.KafkaJournal - Cannot write /usr/share/graylog/data/journal/graylog2-committed-read-offset to disk.
java.io.FileNotFoundException: /usr/share/graylog/data/journal/graylog2-committed-read-offset (Permission denied)

Graylog can’t write to the journal folder (that is clearly indicated by the error above.

And for some reasons your Graylog starts to write to journal. Maybe this happens when the index gets rotated? That you need to find out. Correlate and watch the system when this happens - what does the system do around this time.

1 Like

Previous message in graylog logs more than 1 minute before ERROR, so I do not know what Graylog was doing at the time. This is system messages log at this time. Index rotation was performed at 03:23 UTC - one hour before beginning of Journal Errors.

Timestamp Node Message
2018-12-19T08:15:19+03:00 d940f415 / 7ff5941132c9 Started up.
2018-12-19T08:14:46+03:00 d940f415 / 7ff5941132c9 Graceful shutdown initiated.
2018-12-19T08:14:43+03:00 d940f415 / 7ff5941132c9 SIGNAL received. Shutting down.
2018-12-19T06:31:20+03:00 d940f415 / 7ff5941132c9 SystemJob <73a56ba0-033d-11e9-827f-0242ac1d0003> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob] finished in 472065ms.
2018-12-19T06:23:29+03:00 d940f415 / 7ff5941132c9 SystemJob <61a67ca5-033d-11e9-827f-0242ac1d0003> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob] finished in 1333ms.
2018-12-19T06:23:27+03:00 d940f415 / 7ff5941132c9 Optimizing index <graylog_1>.
2018-12-19T06:23:27+03:00 d940f415 / 7ff5941132c9 Flushed and set <graylog_1> to read-only.
2018-12-19T06:22:57+03:00 d940f415 / 7ff5941132c9 Cycled index alias <graylog_deflector> from <graylog_1> to <graylog_2>.
2018-12-19T05:05:22+03:00 d940f415 / 7ff5941132c9 SystemJob <88048833-0331-11e9-827f-0242ac1d0003> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob] finished in 434560ms.

How can I check why Graylog can not write to the journal folder?
And why Graylog starts to write to journal?

Update:
The issue was in CentOS custom script in /etc/cron.daily/unowned_files. That script do the change of all files with owner not existed in host OS to root:root.

#!/bin/bash

find / -ignore_readdir_race -nouser -print -exec chown root {} \;

find / -ignore_readdir_race -nogroup -print -exec chgrp root {} \;```

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.