Processor stops processing after a short time

Hi all,

I’m having an issue with Graylog.
It was working until I upgraded the underlying OS running the docker container.

Since then Graylog stops after a short time.

It stopped processing messages and the journal fills up.

If I restart the container, it will process the journal for a time, then stop again.

When it stops, I get this error:

2022-06-17 11:21:57,189 ERROR: com.google.common.util.concurrent.ServiceManager - Service LocalKafkaMessageQueueReader [FAILED] has failed in the RUNNING state.
java.lang.IllegalStateException: Invalid message size: 0
        at org.graylog.shaded.kafka09.log.FileMessageSet.searchFor(FileMessageSet.scala:141) ~[graylog.jar:?]
        at org.graylog.shaded.kafka09.log.LogSegment.translateOffset(LogSegment.scala:105) ~[graylog.jar:?]
        at org.graylog.shaded.kafka09.log.LogSegment.read(LogSegment.scala:148) ~[graylog.jar:?]
        at org.graylog.shaded.kafka09.log.Log.read(Log.scala:506) ~[graylog.jar:?]
        at org.graylog2.shared.journal.LocalKafkaJournal.read(LocalKafkaJournal.java:677) ~[graylog.jar:?]
        at org.graylog2.shared.journal.LocalKafkaJournal.readNext(LocalKafkaJournal.java:617) ~[graylog.jar:?]
        at org.graylog2.shared.journal.LocalKafkaJournal.read(LocalKafkaJournal.java:599) ~[graylog.jar:?]
        at org.graylog2.shared.messageq.localkafka.LocalKafkaMessageQueueReader.run(LocalKafkaMessageQueueReader.java:110) ~[graylog.jar:?]
        at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:67) [graylog.jar:?]
        at com.google.common.util.concurrent.Callables$4.run(Callables.java:121) [graylog.jar:?]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_332]

I cannot work out why.

Elasticsearch is fine, according to it’s logs.

I have no pipelines and I tried removing all extractors, same error.

  • OS Information: Official Docker running on Unraid (Linux)

  • Package Version: 4.3.2

Can anyone suggest where to look?
I have tried upgrading Elasticsearch, deleting all indexes and starting fresh.
I deleted the message journal completely.
Tried Graylog 4.2.8 as well. (this was the previous version we were running)
Changed output, process and input buffer processors number to increase resources.
Heap is set to 8Gb and never goes above about 1Gb

Thanks

Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]

Further to this.
It seems that I get this error as the message journal is almost empty.
Let’s say the journal has 200k messages in it.
If I restart the container, the processing buffer will fill to 100% and the messages will be processed.
As soon as the messages in the journal drop below 2000 (or there abouts) this error appears and processing stops.

Hello && welcome
First of all the error above shows that

Service LocalKafkaMessageQueueReader [FAILED] has failed in the RUNNING state

I’m unfamiliar with this service.

Next, As you know the journal is there to catch logs and store them incase Graylog/Elasticsearch is having issues. So it will fill up and depending how many logs are in there you may have to wait till Elasticsearch is done ingesting them. I did notice you stating that when there below 2000 it stops. That’s odd :thinking:

If this is a container have a look at container logs for elasticsearch and/or Graylog?

Docker logs -f <container_id>

Side note, when the journal fills up this is normally due from Elasticsearch settings, resources or perhaps journal is corrupt meaning there might be a corrupt file/log in there.

By chance are you using Docker-compose? if so could you post the YAML file? or anything relevant to setting or configurations?

Thanks for the reply.

The logs above are from the Graylog docker. There isn’t anything else that stands out, it’s the only error.
There’s nothing in the Elasticsearch log when this happens.

That’s what I thought, I have completely deleted the journal a few times. The same thing happens every time, it fills up and as it gets to the low thousands, it crashes and the processor stops.

Running this on Unraid, it manages the docker containers with out compose, I can post the docker run command:

root@localhost:# /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/docker run -d --name='graylog' --net='<REDACTED>' --privileged=true -e TZ="Europe/London" -e HOST_OS="Unraid" -e HOST_HOSTNAME="<REDACTED>" -e HOST_CONTAINERNAME="graylog" -e 'GRAYLOG_PASSWORD_SECRET'='<REDACTED>' -e 'GRAYLOG_ROOT_PASSWORD_SHA2'='<REDACTED>' -e 'GRAYLOG_HTTP_EXTERNAL_URI'='<REDACTED>' -e 'GRAYLOG_MONGODB_URI'='mongodb://<REDACTED>:27017/graylog' -e 'GRAYLOG_ELASTICSEARCH_HOSTS'='http://<REDACTED>:9200' -e 'GRAYLOG_TRANSPORT_EMAIL_WEB_INTERFACE_URL'='<REDACTED>' -e 'GRAYLOG_TRANSPORT_EMAIL_HOSTNAME'='<REDACTED>' -e 'GRAYLOG_TRANSPORT_EMAIL_ENABLED'='true' -e 'GRAYLOG_TRANSPORT_EMAIL_PORT'='25' -e 'GRAYLOG_TRANSPORT_EMAIL_USE_AUTH'='false' -e 'GRAYLOG_TRANSPORT_EMAIL_USE_TLS'='true' -e 'GRAYLOG_TRANSPORT_FROM_EMAIL'='<REDACTED>' -e 'GRAYLOG_TRANSPORT_SUBJECT_PREFIX'='[graylog]' -e 'GRAYLOG_PROCESSBUFFER_PROCESSORS'='10' -e 'GRAYLOG_OUTPUTBUFFER_PROCESSORS'='6' -e 'GRAYLOG_INPUTBUFFER_PROCESSORS'='8' -e 'GRAYLOG_MESSAGE_JOURNAL_MAX_SIZE'='10gb' -e 'GRAYLOG_MESSAGE_JOURNAL_MAX_AGE'='24h' -l net.unraid.docker.managed=dockerman -l net.unraid.docker.webui='http://[IP]:[PORT:9000]' -p '9000:9000/tcp' -p '514:1514/tcp' -p '514:1514/udp' -p '12201:12201/tcp' -p '12201:12201/udp' -v '/mnt/user/appdata/greylog':'/usr/share/graylog/data/journal':'rw' 'graylog/graylog:4.3.2'
<REDACTED>

I thought it was maybe to do with a certain message, but it’s always as it gets to the end of the journal.
Right now, I’ve restarted it and it’s working through 6 million logs. no crashes.

I was able to get some debug logs from around the time this error message happens.
Not sure if this tells you anything?

2022-06-18 20:53:43,804 DEBUG: org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter - [5bc22bc0-ef40-11ec-9736-0242ac13000b] running pipelines [] for streams []
2022-06-18 20:53:43,804 DEBUG: org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter - [5bc22bc0-ef40-11ec-9736-0242ac13000b] no new streams matches or dropped message, not running again
2022-06-18 20:53:43,804 DEBUG: org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter - [5bc22bc1-ef40-11ec-9736-0242ac13000b] running pipelines [] for streams []
2022-06-18 20:53:43,804 DEBUG: org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter - [5bc22bc2-ef40-11ec-9736-0242ac13000b] running pipelines [] for streams []
2022-06-18 20:53:43,804 DEBUG: org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter - [5bc22bc1-ef40-11ec-9736-0242ac13000b] no new streams matches or dropped message, not running again
2022-06-18 20:53:43,804 DEBUG: org.graylog2.messageprocessors.MessageFilterChainProcessor - Applying filter [Extractor] on message <5bc22bc0-ef40-11ec-9736-0242ac13000b>.
2022-06-18 20:53:43,804 DEBUG: org.graylog2.messageprocessors.MessageFilterChainProcessor - Applying filter [Extractor] on message <5bc22bc1-ef40-11ec-9736-0242ac13000b>.
2022-06-18 20:53:43,804 DEBUG: org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter - [5bc22bc2-ef40-11ec-9736-0242ac13000b] no new streams matches or dropped message, not running again
2022-06-18 20:53:43,804 DEBUG: org.graylog2.messageprocessors.MessageFilterChainProcessor - Applying filter [Extractor] on message <5bc22bc2-ef40-11ec-9736-0242ac13000b>.
2022-06-18 20:53:43,804 DEBUG: org.graylog2.messageprocessors.MessageFilterChainProcessor - Applying filter [Static field appender] on message <5bc22bc0-ef40-11ec-9736-0242ac13000b>.
2022-06-18 20:53:43,804 DEBUG: org.graylog2.messageprocessors.MessageFilterChainProcessor - Applying filter [StreamMatcher] on message <5bc22bc0-ef40-11ec-9736-0242ac13000b>.
2022-06-18 20:53:43,804 DEBUG: org.graylog2.messageprocessors.MessageFilterChainProcessor - Applying filter [Static field appender] on message <5bc22bc2-ef40-11ec-9736-0242ac13000b>.
2022-06-18 20:53:43,804 DEBUG: org.graylog2.messageprocessors.MessageFilterChainProcessor - Applying filter [StreamMatcher] on message <5bc22bc2-ef40-11ec-9736-0242ac13000b>.
2022-06-18 20:53:43,804 DEBUG: org.graylog2.messageprocessors.MessageFilterChainProcessor - Applying filter [Static field appender] on message <5bc22bc1-ef40-11ec-9736-0242ac13000b>.
2022-06-18 20:53:43,804 DEBUG: org.graylog2.messageprocessors.MessageFilterChainProcessor - Applying filter [StreamMatcher] on message <5bc22bc1-ef40-11ec-9736-0242ac13000b>.
2022-06-18 20:53:43,805 DEBUG: org.graylog2.filters.StreamMatcherFilter - Routed message <5bc22bc0-ef40-11ec-9736-0242ac13000b> to 2 streams.
2022-06-18 20:53:43,805 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Processing message <5bc22bc0-ef40-11ec-9736-0242ac13000b> from OutputBuffer.
2022-06-18 20:53:43,805 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Writing message to [class org.graylog2.outputs.BlockingBatchedESOutput].
2022-06-18 20:53:43,805 DEBUG: org.graylog2.filters.StreamMatcherFilter - Routed message <5bc22bc2-ef40-11ec-9736-0242ac13000b> to 2 streams.
2022-06-18 20:53:43,805 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Wrote message <5bc22bc0-ef40-11ec-9736-0242ac13000b> to all outputs. Finished handling.
2022-06-18 20:53:43,805 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Processing message <5bc22bc2-ef40-11ec-9736-0242ac13000b> from OutputBuffer.
2022-06-18 20:53:43,805 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Writing message to [class org.graylog2.outputs.BlockingBatchedESOutput].
2022-06-18 20:53:43,805 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Wrote message <5bc22bc2-ef40-11ec-9736-0242ac13000b> to all outputs. Finished handling.
2022-06-18 20:53:43,805 DEBUG: org.graylog2.filters.StreamMatcherFilter - Routed message <5bc22bc1-ef40-11ec-9736-0242ac13000b> to 2 streams.
2022-06-18 20:53:43,805 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Processing message <5bc22bc1-ef40-11ec-9736-0242ac13000b> from OutputBuffer.
2022-06-18 20:53:43,805 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Writing message to [class org.graylog2.outputs.BlockingBatchedESOutput].
2022-06-18 20:53:43,805 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Wrote message <5bc22bc1-ef40-11ec-9736-0242ac13000b> to all outputs. Finished handling.
2022-06-18 20:53:43,805 DEBUG: org.graylog2.periodical.BatchedElasticSearchOutputFlushThread - Checking for outputs to flush ...
2022-06-18 20:53:43,805 DEBUG: org.graylog2.periodical.BatchedElasticSearchOutputFlushThread - Flushing output <org.graylog2.outputs.BlockingBatchedESOutput@41c44002>
2022-06-18 20:53:43,805 ERROR: com.google.common.util.concurrent.ServiceManager - Service LocalKafkaMessageQueueReader [FAILED] has failed in the RUNNING state.
java.lang.IllegalStateException: Invalid message size: 0
        at org.graylog.shaded.kafka09.log.FileMessageSet.searchFor(FileMessageSet.scala:141) ~[graylog.jar:?]
        at org.graylog.shaded.kafka09.log.LogSegment.translateOffset(LogSegment.scala:105) ~[graylog.jar:?]
        at org.graylog.shaded.kafka09.log.LogSegment.read(LogSegment.scala:126) ~[graylog.jar:?]
        at org.graylog.shaded.kafka09.log.Log.read(Log.scala:506) ~[graylog.jar:?]
        at org.graylog2.shared.journal.LocalKafkaJournal.read(LocalKafkaJournal.java:677) ~[graylog.jar:?]
        at org.graylog2.shared.journal.LocalKafkaJournal.readNext(LocalKafkaJournal.java:617) ~[graylog.jar:?]
        at org.graylog2.shared.journal.LocalKafkaJournal.read(LocalKafkaJournal.java:599) ~[graylog.jar:?]
        at org.graylog2.shared.messageq.localkafka.LocalKafkaMessageQueueReader.run(LocalKafkaMessageQueueReader.java:110) ~[graylog.jar:?]
        at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:67) [graylog.jar:?]
        at com.google.common.util.concurrent.Callables$4.run(Callables.java:121) [graylog.jar:?]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_332]
2022-06-18 20:53:43,811 DEBUG: org.graylog2.shared.initializers.InputSetupService - Lifecycle is now Failed [LB:DEAD]
2022-06-18 20:53:43,811 DEBUG: org.graylog2.shared.buffers.JournallingMessageHandler - End of batch, journaling 1 messages
2022-06-18 20:53:43,811 DEBUG: org.graylog2.configuration.HttpConfiguration - No "http_publish_uri" set. Using default <http://172.19.0.11:9000/>.
2022-06-18 20:53:43,811 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Trying to write ByteBufferMessageSet with size of 423 bytes to journal
2022-06-18 20:53:43,811 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Wrote 1 messages to journal: 423 bytes (payload 381 bytes), log position 18784844 to 18784844
2022-06-18 20:53:43,811 DEBUG: org.graylog2.shared.messageq.localkafka.LocalKafkaMessageQueueWriter - Processed batch, last journal offset: 18784844, signalling reader.
2022-06-18 20:53:43,811 DEBUG: org.graylog2.shared.buffers.JournallingMessageHandler - End of batch, journaling 5 messages
2022-06-18 20:53:43,811 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Trying to write ByteBufferMessageSet with size of 2281 bytes to journal
2022-06-18 20:53:43,811 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Wrote 5 messages to journal: 2281 bytes (payload 2071 bytes), log position 18784845 to 18784849
2022-06-18 20:53:43,811 DEBUG: org.graylog2.shared.messageq.localkafka.LocalKafkaMessageQueueWriter - Processed batch, last journal offset: 18784849, signalling reader.
2022-06-18 20:53:43,812 DEBUG: org.graylog2.periodical.ThrottleStateUpdaterThread - ThrottleState: ThrottleState{uncommittedJournalEntries=118, appendEventsPerSec=117, readEventsPerSec=109, journalSize=33471224, journalSizeLimit=10737418240, pbCapacity=65536}
2022-06-18 20:53:43,812 DEBUG: org.graylog2.shared.events.DeadEventLoggingListener - Received unhandled event of type <org.graylog2.plugin.ThrottleState> from event bus <AsyncEventBus{graylog-eventbus}>
2022-06-18 20:53:43,812 DEBUG: org.graylog2.shared.events.DeadEventLoggingListener - Dead event contents: ThrottleState{uncommittedJournalEntries=118, appendEventsPerSec=117, readEventsPerSec=109, journalSize=33471224, journalSizeLimit=10737418240, pbCapacity=65536}
2022-06-18 20:53:43,814 DEBUG: org.graylog2.events.ClusterEventPeriodical - Opening MongoDB cursor on "cluster_events"
2022-06-18 20:53:43,815 DEBUG: org.graylog2.shared.buffers.JournallingMessageHandler - End of batch, journaling 2 messages
2022-06-18 20:53:43,815 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Trying to write ByteBufferMessageSet with size of 847 bytes to journal
2022-06-18 20:53:43,815 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Wrote 2 messages to journal: 847 bytes (payload 763 bytes), log position 18784850 to 18784851
2022-06-18 20:53:43,815 DEBUG: org.graylog2.shared.messageq.localkafka.LocalKafkaMessageQueueWriter - Processed batch, last journal offset: 18784851, signalling reader.
2022-06-18 20:53:43,815 DEBUG: org.graylog2.shared.buffers.JournallingMessageHandler - End of batch, journaling 2 messages

And from the time it crashed before:

2022-06-18 20:49:14,537 DEBUG: org.graylog2.messageprocessors.MessageFilterChainProcessor - Applying filter [StreamMatcher] on message <bb434991-ef3f-11ec-a1dc-0242ac13000b>.
2022-06-18 20:49:14,537 DEBUG: org.graylog2.messageprocessors.MessageFilterChainProcessor - Applying filter [Static field appender] on message <bb434992-ef3f-11ec-a1dc-0242ac13000b>.
2022-06-18 20:49:14,537 DEBUG: org.graylog2.messageprocessors.MessageFilterChainProcessor - Applying filter [StreamMatcher] on message <bb434992-ef3f-11ec-a1dc-0242ac13000b>.
2022-06-18 20:49:14,537 DEBUG: org.graylog2.shared.buffers.JournallingMessageHandler - End of batch, journaling 1 messages
2022-06-18 20:49:14,537 DEBUG: org.graylog2.filters.StreamMatcherFilter - Routed message <bb434990-ef3f-11ec-a1dc-0242ac13000b> to 3 streams.
2022-06-18 20:49:14,537 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Trying to write ByteBufferMessageSet with size of 423 bytes to journal
2022-06-18 20:49:14,537 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Processing message <bb434990-ef3f-11ec-a1dc-0242ac13000b> from OutputBuffer.
2022-06-18 20:49:14,537 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Writing message to [class org.graylog2.outputs.BlockingBatchedESOutput].
2022-06-18 20:49:14,537 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Wrote 1 messages to journal: 423 bytes (payload 381 bytes), log position 18751652 to 18751652
2022-06-18 20:49:14,537 DEBUG: org.graylog2.shared.messageq.localkafka.LocalKafkaMessageQueueWriter - Processed batch, last journal offset: 18751652, signalling reader.
2022-06-18 20:49:14,537 DEBUG: org.graylog2.shared.buffers.JournallingMessageHandler - End of batch, journaling 2 messages
2022-06-18 20:49:14,537 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Wrote message <bb434990-ef3f-11ec-a1dc-0242ac13000b> to all outputs. Finished handling.
2022-06-18 20:49:14,537 DEBUG: org.graylog2.shared.messageq.localkafka.LocalKafkaMessageQueueReader - Messages have been written to Journal, continuing to read.
2022-06-18 20:49:14,537 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Trying to write ByteBufferMessageSet with size of 922 bytes to journal
2022-06-18 20:49:14,537 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Requesting to read a maximum of 65534 messages (or 5MB) from the journal, offset interval [18751652, 18817186)
2022-06-18 20:49:14,538 DEBUG: org.graylog2.filters.StreamMatcherFilter - Routed message <bb434991-ef3f-11ec-a1dc-0242ac13000b> to 3 streams.
2022-06-18 20:49:14,538 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Processing message <bb434991-ef3f-11ec-a1dc-0242ac13000b> from OutputBuffer.
2022-06-18 20:49:14,538 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Wrote 2 messages to journal: 922 bytes (payload 838 bytes), log position 18751653 to 18751654
2022-06-18 20:49:14,538 DEBUG: org.graylog2.shared.messageq.localkafka.LocalKafkaMessageQueueWriter - Processed batch, last journal offset: 18751654, signalling reader.
2022-06-18 20:49:14,538 DEBUG: org.graylog2.shared.buffers.JournallingMessageHandler - End of batch, journaling 2 messages
2022-06-18 20:49:14,538 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Writing message to [class org.graylog2.outputs.BlockingBatchedESOutput].
2022-06-18 20:49:14,538 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Trying to write ByteBufferMessageSet with size of 918 bytes to journal
2022-06-18 20:49:14,538 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Wrote message <bb434991-ef3f-11ec-a1dc-0242ac13000b> to all outputs. Finished handling.
2022-06-18 20:49:14,538 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Wrote 2 messages to journal: 918 bytes (payload 834 bytes), log position 18751655 to 18751656
2022-06-18 20:49:14,538 DEBUG: org.graylog2.shared.messageq.localkafka.LocalKafkaMessageQueueWriter - Processed batch, last journal offset: 18751656, signalling reader.
2022-06-18 20:49:14,538 DEBUG: org.graylog2.filters.StreamMatcherFilter - Routed message <bb434992-ef3f-11ec-a1dc-0242ac13000b> to 3 streams.
2022-06-18 20:49:14,538 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Processing message <bb434992-ef3f-11ec-a1dc-0242ac13000b> from OutputBuffer.
2022-06-18 20:49:14,538 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Writing message to [class org.graylog2.outputs.BlockingBatchedESOutput].
2022-06-18 20:49:14,538 DEBUG: org.graylog2.buffers.processors.OutputBufferProcessor - Wrote message <bb434992-ef3f-11ec-a1dc-0242ac13000b> to all outputs. Finished handling.
2022-06-18 20:49:14,539 ERROR: com.google.common.util.concurrent.ServiceManager - Service LocalKafkaMessageQueueReader [FAILED] has failed in the RUNNING state.
java.lang.IllegalStateException: Invalid message size: 0
        at org.graylog.shaded.kafka09.log.FileMessageSet.searchFor(FileMessageSet.scala:141) ~[graylog.jar:?]
        at org.graylog.shaded.kafka09.log.LogSegment.translateOffset(LogSegment.scala:105) ~[graylog.jar:?]
        at org.graylog.shaded.kafka09.log.LogSegment.read(LogSegment.scala:148) ~[graylog.jar:?]
        at org.graylog.shaded.kafka09.log.Log.read(Log.scala:506) ~[graylog.jar:?]
        at org.graylog2.shared.journal.LocalKafkaJournal.read(LocalKafkaJournal.java:677) ~[graylog.jar:?]
        at org.graylog2.shared.journal.LocalKafkaJournal.readNext(LocalKafkaJournal.java:617) ~[graylog.jar:?]
        at org.graylog2.shared.journal.LocalKafkaJournal.read(LocalKafkaJournal.java:599) ~[graylog.jar:?]
        at org.graylog2.shared.messageq.localkafka.LocalKafkaMessageQueueReader.run(LocalKafkaMessageQueueReader.java:110) ~[graylog.jar:?]
        at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:67) [graylog.jar:?]
        at com.google.common.util.concurrent.Callables$4.run(Callables.java:121) [graylog.jar:?]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_332]
2022-06-18 20:49:14,539 DEBUG: org.graylog2.shared.buffers.JournallingMessageHandler - End of batch, journaling 1 messages
2022-06-18 20:49:14,540 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Trying to write ByteBufferMessageSet with size of 419 bytes to journal
2022-06-18 20:49:14,540 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Wrote 1 messages to journal: 419 bytes (payload 377 bytes), log position 18751657 to 18751657
2022-06-18 20:49:14,540 DEBUG: org.graylog2.shared.messageq.localkafka.LocalKafkaMessageQueueWriter - Processed batch, last journal offset: 18751657, signalling reader.
2022-06-18 20:49:14,540 DEBUG: org.graylog2.shared.buffers.JournallingMessageHandler - End of batch, journaling 1 messages
2022-06-18 20:49:14,540 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Trying to write ByteBufferMessageSet with size of 486 bytes to journal
2022-06-18 20:49:14,540 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Wrote 1 messages to journal: 486 bytes (payload 444 bytes), log position 18751658 to 18751658
2022-06-18 20:49:14,540 DEBUG: org.graylog2.shared.initializers.InputSetupService - Lifecycle is now Failed [LB:DEAD]
2022-06-18 20:49:14,540 DEBUG: org.graylog2.shared.messageq.localkafka.LocalKafkaMessageQueueWriter - Processed batch, last journal offset: 18751658, signalling reader.
2022-06-18 20:49:14,541 DEBUG: org.graylog2.shared.buffers.JournallingMessageHandler - End of batch, journaling 1 messages
2022-06-18 20:49:14,541 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Trying to write ByteBufferMessageSet with size of 620 bytes to journal
2022-06-18 20:49:14,541 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Wrote 1 messages to journal: 620 bytes (payload 578 bytes), log position 18751659 to 18751659
2022-06-18 20:49:14,541 DEBUG: org.graylog2.shared.messageq.localkafka.LocalKafkaMessageQueueWriter - Processed batch, last journal offset: 18751659, signalling reader.
2022-06-18 20:49:14,543 DEBUG: org.graylog2.shared.buffers.JournallingMessageHandler - End of batch, journaling 2 messages
2022-06-18 20:49:14,543 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Trying to write ByteBufferMessageSet with size of 846 bytes to journal
2022-06-18 20:49:14,543 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Wrote 2 messages to journal: 846 bytes (payload 762 bytes), log position 18751660 to 18751661
2022-06-18 20:49:14,543 DEBUG: org.graylog2.shared.messageq.localkafka.LocalKafkaMessageQueueWriter - Processed batch, last journal offset: 18751661, signalling reader.
2022-06-18 20:49:14,543 DEBUG: org.graylog2.shared.buffers.JournallingMessageHandler - End of batch, journaling 2 messages
2022-06-18 20:49:14,543 DEBUG: org.graylog2.shared.journal.LocalKafkaJournal - Trying to write ByteBufferMessageSet with size of 984 bytes to journal

Done some more digging.
I have started a fresh install of Graylog with a blank Mongo DB and elasticsearch and I still get this error!
Could it be to do with the syslogs that we are receiving? could the content of one of them be causing the crash?

It’s getting weirder.

i set the logging level to Trace, to try and get more info. now it will not crash.
If it set the logging level back to Info, it crashes.

Also, I pointed the incoming messages at a test Graylog server, and it also crashed.
Completely different physical host, database and ES.

Anyone know what to do next?

Hello @tslytsly

Sorry for the delay, I was on va-ca. I went over everything you posted.

This maybe a configuration error. can you post you ES file and GL config?

Thanks @gsmith, hope you had a great vacation.

I’m using Docker containers for both, so I don’t edit the config files, just start them with env variables.
I posted the vars for GL above, and the only vars for the ES container are what ports to expose.

Hello,
Oh gotcha, I over looked that. Ill post it here again for clarity.

root@localhost:# /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/docker run -d 
--name='graylog' 
--net='<REDACTED>' 
--privileged=true 
-e TZ="Europe/London" 
-e HOST_OS="Unraid" 
-e HOST_HOSTNAME="<REDACTED>" -
-e HOST_CONTAINERNAME="graylog" 
-e 'GRAYLOG_PASSWORD_SECRET'='<REDACTED>' 
-e 'GRAYLOG_ROOT_PASSWORD_SHA2'='<REDACTED>' 
-e 'GRAYLOG_HTTP_EXTERNAL_URI'='<REDACTED>' 
-e 'GRAYLOG_MONGODB_URI'='mongodb://<REDACTED>:27017/graylog' 
-e 'GRAYLOG_ELASTICSEARCH_HOSTS'='http://<REDACTED>:9200' 
-e 'GRAYLOG_TRANSPORT_EMAIL_WEB_INTERFACE_URL'='<REDACTED>' 
-e 'GRAYLOG_TRANSPORT_EMAIL_HOSTNAME'='<REDACTED>' 
-e 'GRAYLOG_TRANSPORT_EMAIL_ENABLED'='true' 
-e 'GRAYLOG_TRANSPORT_EMAIL_PORT'='25' 
-e 'GRAYLOG_TRANSPORT_EMAIL_USE_AUTH'='false' 
-e 'GRAYLOG_TRANSPORT_EMAIL_USE_TLS'='true' 
-e 'GRAYLOG_TRANSPORT_FROM_EMAIL'='<REDACTED>' 
-e 'GRAYLOG_TRANSPORT_SUBJECT_PREFIX'='[graylog]' 
-e 'GRAYLOG_PROCESSBUFFER_PROCESSORS'='10' 
-e 'GRAYLOG_OUTPUTBUFFER_PROCESSORS'='6'
-e 'GRAYLOG_INPUTBUFFER_PROCESSORS'='8' 
-e 'GRAYLOG_MESSAGE_JOURNAL_MAX_SIZE'='10gb'
-e 'GRAYLOG_MESSAGE_JOURNAL_MAX_AGE'='24h' 
-l net.unraid.docker.managed=dockerman 
-l net.unraid.docker.webui='http://[IP]:[PORT:9000]' 
-p '9000:9000/tcp' 
-p '514:1514/tcp'
-p '514:1514/udp' 
-p '12201:12201/tcp' 
-p '12201:12201/udp' 
-v '/mnt/user/appdata/greylog':'/usr/share/graylog/data/journal':'rw' 'graylog/graylog:4.3.2'
<REDACTED>

Couple questions:
I don’t see port 9200 for ES opened on this Graylog container?
The reason I ask this is because of this command.

-e 'GRAYLOG_ELASTICSEARCH_HOSTS'='http://<REDACTED>:9200

Does Elasticsearch container have the following port open 9200 and do you have the var’s network.host matching GRAYLOG_ELASTICSEARCH_HOSTS ip address?
To clarify, It looks like your using IP addresses and not default localhost/127.0.0.1 for connection.
Example:

docker run --link mongo --link elasticsearch \    
    -p 9000:9000 -p 12201:12201 -p 1514:1514 -p 5555:5555 \    
    -e GRAYLOG_HTTP_EXTERNAL_URI="http://127.0.0.1:9000/" \    
    -d graylog/graylog:4.2.3

EDIT: Noticed you created a lot of threads;

-e 'GRAYLOG_PROCESSBUFFER_PROCESSORS'='10' 
-e 'GRAYLOG_OUTPUTBUFFER_PROCESSORS'='6'
-e 'GRAYLOG_INPUTBUFFER_PROCESSORS'='8' 

The rule is, those configuration should be equal or less then the amount of CPU cores on the node. Having to many threads could cause issues. So I’m assuming you have at least 24 CPU core on the node with nothing to spare for the OS? If so, you may want to reduce those configuration so Graylog node has some CPU.

This is depending on how much logs are ingested. For example I have 30 GB of log per day I have 14 CPU cores on this node. Here are my settings.

processbuffer_processors = 7
outputbuffer_processors = 3
inputbuffer_processors = 2

Total of 12 Threads , one per core and I left two core for my OS.
Your heavy hitter is the processbuffer_processors . The inputbuffer_processors doesn’t need much nor does outputbuffer_processors.

EDIT: You stated configuration were made for Heap. There are two setting for this which one did you use?

 - "ES_JAVA_OPTS=-Xms4g -Xmx4g"

And

 - GRAYLOG_SERVER_JAVA_OPTS="-Xms3g -Xmx3g"

I see you posted here, Holy cow, I read the rest of that post.

Here is my Docker-compose.yaml file. to help further. Might find something in there that will help

yaml_file
version: '3'
services:
   # MongoDB: https://hub.docker.com/_/mongo/
  mongodb:
    image: mongo:4.4
    network_mode: bridge
   # DB in share for persistence
    volumes:
      - mongo_data:/data/db
   # Elasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/6.6/docker.html
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2-amd64
    # image: opensearchproject/opensearch:1.3.2
    network_mode: bridge
    #data folder in share for persistence
    volumes:
      - es_data:/usr/share/elasticsearch/data
    environment:
      - http.host=0.0.0.0
      - transport.host=localhost
      - network.host=0.0.0.0
      - "ES_JAVA_OPTS=-Xms4g -Xmx4g"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    mem_limit: 1g
   # Graylog: https://hub.docker.com/r/graylog/graylog/
  graylog:
    image: graylog/graylog:4.3-jre11
    network_mode: bridge
    dns:
      - 10.10.10.2
      - 10.10.10.3
   # journal and config directories in local NFS share for persistence
    volumes:
       - graylog_bin:/usr/share/graylog/bin
       - graylog_data:/usr/share/graylog/data/config
       - graylog_log:/usr/share/graylog/data/log
       - graylog_plugin:/usr/share/graylog/data/plugin
       - graylog_content:/usr/share/graylog/data/contentpacks      
    environment:
      # Container time Zone
      - TZ=America/Chicago
      # CHANGE ME (must be at least 16 characters)!
      - GRAYLOG_PASSWORD_SECRET=pJod1TRZAckHZuyb2YW9eHiKLTifjy7cLpeWIjWgMtnwZf6Q
      # Password: admin
      - GRAYLOG_ROOT_PASSWORD_SHA2=ef92b778bafe6c06659911881f383d4473e94f
      - GRAYLOG_HTTP_BIND_ADDRESS=0.0.0.0:9000
      - GRAYLOG_HTTP_EXTERNAL_URI=http://10.10.10.10:9000/
      - GRAYLOG_ROOT_TIMEZONE=America/Chicago
      - GRAYLOG_ROOT_EMAIL=greg.smith@domain.com
      - GRAYLOG_HTTP_PUBLISH_URI=http://10.10.10.10:9000/
      - GRAYLOG_TRANSPORT_EMAIL_PROTOCOL=smtp
      - GRAYLOG_HTTP_ENABLE_CORS=true
      - GRAYLOG_TRANSPORT_EMAIL_WEB_INTERFACE_URL=http://10.10.10.10:9000/
      - GRAYLOG_TRANSPORT_EMAIL_HOSTNAME=10.10.10.10
      - GRAYLOG_TRANSPORT_EMAIL_ENABLED=true
      - GRAYLOG_TRANSPORT_EMAIL_PORT=25
      - GRAYLOG_TRANSPORT_EMAIL_USE_AUTH=false
      - GRAYLOG_TRANSPORT_EMAIL_USE_TLS=false
      - GRAYLOG_TRANSPORT_EMAIL_USE_SSL=false
      - GRAYLOG_TRANSPORT_FROM_EMAIL=root@localhost
      - GRAYLOG_TRANSPORT_SUBJECT_PREFIX=[graylog]
      - GRAYLOG_REPORT_DISABLE_SANDBOX=true
      - GRAYLOG_REPORT_RENDER_URI=http://10.10.10.10:9000
      # - GRAYLOG_REPORT_USER=graylog-report
      - GRAYLOG_REPORT_RENDER_ENGINE_PORT=9515
    links:
      - mongodb:mongo
      - elasticsearch
    depends_on:
      - mongodb
      - elasticsearch
    ports:
      # Graylog web interface and REST API
      - 9000:9000
      # Syslog TCP
      - 8514:8514
      # Elasticsearch
      - 9200:9200
      - 9300:9300
      # Syslog UDP
      - 8514:8514/udp
      # GELF TCP
      #- 12201:12201
      # GELF UDP
      #- 12201:12201/udp
      # Reports
      - 9515:9515
      - 9515:9515/udp
      # beats
      - 5044:5044
      # email
      - 25:25
      - 25:25/udp
      # web
      - 80:80
      - 443:443
      - 21:21
      # Forwarder
      - 13302:13302
      - 13301:13301
      # keycloak
      - 8443:8443
      # packetbeat
      - 5055:5055
#Volumes for persisting data, see https://docs.docker.com/engine/admin/volumes/volumes/
volumes:
  mongo_data:
    driver: local
  es_data:
    driver: local
  graylog_journal:
    driver: local
  graylog_bin:
    driver: local
  graylog_data:
    driver: local
  graylog_log:
    driver: local
  graylog_plugin:
    driver: local
  graylog_content:
    driver: local

Thank you for all your help.

Yes, I have another docker on the same docker net that is running ES, I know this works fine generally.

Good point, I increased these as part of my troubleshooting, the server only has 16 CPUs, so I will change these back to what they were before. The issue was present before I increased these, so I don’t think it’s this.

I looked at this again, I didn’t specify any settings for these, so it’s using the defaults of half the available RAM, 8Gb, which is more than enough.

Thanks for the docker compose, that’s helpful.
I’ve checked it and nothing stands out.

as I said above, I can recreate this on a completely different graylog server, if I point my incoming syslog traffic at that server, it falls over.
So it’s something in the messages that I’m ingesting.
It might even be a bug that I need to report to Graylog.

Hello,

This is strange, and I’m running out of suggestions. If the Graylog node is fine until you send logs to the INPUT, I’m assuming this might be a resource issue to index those logs, but I’m not 100% sure.

Sorry, I may not have been clear, what I was demonstrating is one setting for Elasticsearch Heap and the other is for Graylog Heap.

This documentation shows what I was talking about. Not sure if it will help thou, just a thought.

As for issues, you can post here.