Graylog Throughput / Metrics problem

Zerobot · January 2, 2019, 6:59am

Hello!
So I’ve got a strange case where everything is “status green” yet Graylog randomly stop ingesting new messages from 1 specific input.
We have a cluster of 3 Kafka’s, every topic has set partition 3 replication 1 but there is this one input which has problems. I’ve discovered that on Input “Show details” it says that only one Graylog node has any I/O for that input, the rest has

Network IO: 0B

Every other input works like a charm, literally same idea: kafka topic with 3 partition and 1 replication. I wouldn’t mind that only one Graylog node ingests anything from that one input but it seems that randomly Graylog stops ingesting from it AT ALL and that’s problematic. Logstash logs 0 problems, so does Kafka.
Please help

jan · January 6, 2019, 12:26pm

So you have multiple inputs that are configured the same way for different kafka topics that are also setup identical. One of them is not working as it should - what is the difference in them?

triple check if the configuration is the same
check if you see the same log entries when you start the inputs?
what is in your log files when no ingest is done?

Zerobot · January 6, 2019, 1:41pm

Yes, we have multiple topics on a 3-node kafka cluster, every topic has 3 partitions and 1 replication.
Graylog side every input looks the same, only topic names are different and Input names of course.

We have 3 graylog nodes so this is a 1=1 ratio for Kafka and Graylog.

This situation occurs only on Input, LAG on Kafka offset is slowly increasing and one (sometimes two) graylog consumers.do not “consume” any messages. This is quite strange because all partitions contain the very same log messages so this is a very abstract issue - why does one node consume something and the other doesn’t?

Graylogs logs on nodes that are idle on this one Input state stuff like “Message too big for the consumer” even tho we are running on the defaults and by default we have a maximum message size of 1MB and again: all partitions contain same logs + all graylogs have the same config cuz they are a cluster so this is really strange that one node does something and the other doesn’t

Zerobot · January 7, 2019, 9:52am

Kafka has those messages in logs when I click “Start Input” in Graylog for that problematic Input:

[2019-01-07 10:49:52,265] INFO Got user-level KeeperException when processing sessionid:0x6XXXX62c008c type:create cxid:0x3 zxid:XXXX9bc9 txntype:-1 reqpath:n/a Error Path:/consumers/graylog2 Error:KeeperErrorCode = NodeExists for /consumers/graylog2 (org.apache.zookeeper.server.PrepRequestProcessor)

jan · January 7, 2019, 12:12pm

Message too big for the consumer

Did you checked the messages? Maybe one big message is in the queue?

Zerobot · January 8, 2019, 11:10am

After setting everything back to defaults where Kafka ingests messages of maximum size 1MB (and also has a 1MB limit on producer) this is the error I still receive from time to time on Graylog:

2019-01-08T11:56:20.059+01:00 ERROR [KafkaTransport] Kafka consumer error, stopping consumer thread.
kafka.common.MessageSizeTooLargeException: Found a message larger than the maximum fetch size of this consumer on topic xxxx_topic partition 2 at fetch offset 1492647. Increase the fetch size, or decrease the maximum message size the broker will allow.
at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:90) ~[graylog.jar:?]
at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33) ~[graylog.jar:?]
at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66) ~[graylog.jar:?]
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58) ~[graylog.jar:?]
at org.graylog2.inputs.transports.KafkaTransport$6.run(KafkaTransport.java:228) [graylog.jar:?]
at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) [graylog.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]

system · January 22, 2019, 11:10am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog Kafka Input Processing is slow Graylog Central (peer support)	12	2292	July 14, 2017
Kafka input plugin and topic offsets Graylog Central (peer support)	3	1938	September 18, 2018
Championing Graylog and need performance advice Graylog Central (peer support)	10	4128	September 14, 2017
Graylog delay index in a single topic Graylog Central (peer support)	3	614	May 8, 2019
Issue while consuming message in graylog Graylog Central (peer support)	1	806	July 10, 2020

Graylog Throughput / Metrics problem

Related topics