Hello!
So I’ve got a strange case where everything is “status green” yet Graylog randomly stop ingesting new messages from 1 specific input.
We have a cluster of 3 Kafka’s, every topic has set partition 3 replication 1 but there is this one input which has problems. I’ve discovered that on Input “Show details” it says that only one Graylog node has any I/O for that input, the rest has
Network IO: 0B
Every other input works like a charm, literally same idea: kafka topic with 3 partition and 1 replication. I wouldn’t mind that only one Graylog node ingests anything from that one input but it seems that randomly Graylog stops ingesting from it AT ALL and that’s problematic. Logstash logs 0 problems, so does Kafka.
Please help
So you have multiple inputs that are configured the same way for different kafka topics that are also setup identical. One of them is not working as it should - what is the difference in them?
triple check if the configuration is the same
check if you see the same log entries when you start the inputs?
Yes, we have multiple topics on a 3-node kafka cluster, every topic has 3 partitions and 1 replication.
Graylog side every input looks the same, only topic names are different and Input names of course.
We have 3 graylog nodes so this is a 1=1 ratio for Kafka and Graylog.
This situation occurs only on Input, LAG on Kafka offset is slowly increasing and one (sometimes two) graylog consumers.do not “consume” any messages. This is quite strange because all partitions contain the very same log messages so this is a very abstract issue - why does one node consume something and the other doesn’t?
Graylogs logs on nodes that are idle on this one Input state stuff like “Message too big for the consumer” even tho we are running on the defaults and by default we have a maximum message size of 1MB and again: all partitions contain same logs + all graylogs have the same config cuz they are a cluster so this is really strange that one node does something and the other doesn’t
After setting everything back to defaults where Kafka ingests messages of maximum size 1MB (and also has a 1MB limit on producer) this is the error I still receive from time to time on Graylog:
2019-01-08T11:56:20.059+01:00 ERROR [KafkaTransport] Kafka consumer error, stopping consumer thread.
kafka.common.MessageSizeTooLargeException: Found a message larger than the maximum fetch size of this consumer on topic xxxx_topic partition 2 at fetch offset 1492647. Increase the fetch size, or decrease the maximum message size the broker will allow.
at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:90) ~[graylog.jar:?]
at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33) ~[graylog.jar:?]
at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66) ~[graylog.jar:?]
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58) ~[graylog.jar:?]
at org.graylog2.inputs.transports.KafkaTransport$6.run(KafkaTransport.java:228) [graylog.jar:?]
at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) [graylog.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]