Uncommitted messages deleted & high journal utilization in Graylog

Hello everyone,

I’m seeing two alerts on my Graylog node:

  1. Uncommitted messages deleted from journal

    Some messages were deleted from the Graylog journal before they could be written to Elasticsearch. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit.

  2. Journal utilization is too high

    Journal utilization is too high and may go over the limit soon. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit.

Both were triggered about 4 minutes ago.


1. Rsyslog configuration

I want all system logs forwarded to Graylog via TCP on port 5514. Here is my /etc/rsyslog.d/90-graylog-tcp.conf:

# Load modules
module(load="imuxsock")     # local Unix sockets
module(load="imklog")       # kernel logs
module(load="imjournal")    # systemd journal
module(load="imfile")       # file monitoring
module(load="omtcp")        # TCP output

# Forward to Graylog
action(
  type="omfwd"
  Target="graylog.domain.de"
  Port="5514"
  Protocol="tcp"
  Template="RSYSLOG_SyslogProtocol23Format"
  TCP_Framing="octet-counted"
  KeepAlive="on"
  Queue.Type="LinkedList"
  Queue.Size="500000"
  Queue.DequeueBatchSize="1000"
  Queue.SaveOnShutdown="on"
  Action.ResumeRetryCount="-1"
  Action.ResumeInterval="10"
)

# Journald input
input(
  type="imjournal"
  Tag="systemd"
  Facility="local6"
  Severity="info"
  StateFile="imjournal-state"
)

# Monitor /var/log/auth.log
input(
  type="imfile"
  File="/var/log/auth.log"
  Tag="authlog"
  Severity="info"
  Facility="local5"
  StateFile="/var/spool/rsyslog/state-authlog"
  PersistStateInterval="200"
  addMetadata="on"
  ReadMode="2"
)

2. Docker Compose setup

I’m running Graylog 6.1 with a Graylog DataNode (OpenSearch) 6.2 and MongoDB 5.0. Key parts of docker-compose.yml:

version: '3.8'
services:
  mongodb:
    image: mongo:5.0
    volumes: [ mongodb_data:/data/db ]
    restart: on-failure

  datanode:
    image: graylog/graylog-datanode:6.2
    environment:
      GRAYLOG_DATANODE_JAVA_OPTS: "-Xms12g -Xmx12g"
      GRAYLOG_DATANODE_ADDITIONAL_OPENSEARCH_SETTINGS: |
        indices.breaker.total.limit: 75%
        indices.breaker.request.limit: 60%
        indices.breaker.fielddata.limit: 60%
        index.refresh_interval: 30s
    ports:
      - "9200:9200"
      - "9300:9300"
    ulimits:
      nofile: { soft: 65536, hard: 65536 }
      memlock: { soft: -1, hard: -1 }
    restart: on-failure

  graylog:
    image: graylog/graylog:6.1
    depends_on: [ mongodb ]
    environment:
      GRAYLOG_SERVER_JAVA_OPTS: "-Xms16g -Xmx16g -XX:+UseG1GC -XX:+UnlockExperimentalVMOptions -XX:+DisableExplicitGC"
      GRAYLOG_INPUTBUFFER_PROCESSORS: "2"
      GRAYLOG_PROCESSBUFFER_PROCESSORS: "12"
      GRAYLOG_OUTPUTBUFFER_PROCESSORS: "6"
      GRAYLOG_SERVER_JOURNAL_WARNING_THRESHOLD: "0.9"
    ports:
      - "5514:5514/tcp"
      - "5514:5514/udp"
      - "9000:9000"
    volumes:
      - graylog_data:/usr/share/graylog/data/data
      - graylog_journal:/usr/share/graylog/data/journal
    restart: on-failure

volumes:
  mongodb_data:
  graylog_data:
  graylog_journal:

3. Host & Graylog stats

  • Host: ESXi VM, 12 vCPUs @ 37 535 MHz, 32 GB RAM (20 GB active), 1.5 TB SSD
  • Graylog node: 1 active node
  • Throughput: 1 186 msg/s incoming, 1 121 msg/s outgoing
  • Journal size: 6 911 664 unprocessed messages in 47 segments
  • Heap usage: 7.8 GiB of 16 GiB

My questions

  1. Rsyslog: Do you spot any mistakes in my configuration? My goal is to forward all logs reliably to Graylog.
  2. Performance tuning: How can I optimize Graylog (or OpenSearch) for better ingestion throughput? I can’t allocate more RAM or CPU; storage is fast SSD.
  3. Cleaning the journal: What’s the safest way to delete or truncate unprocessed messages in the Graylog journal without data corruption?

Thanks in advance for any pointers!

If you go to your nodes page and look at the stats of your graylog node can you post a picture of that showing buffer usage, journal stats etc.

Something somewhere is not keep up.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.