Kafka input plugin and topic offsets

gwaugh · September 4, 2018, 6:38am

Hi all,

I have a 6-node Graylog 2.4.3 cluster pulling logs out of a 3-node Kafka 2.0.0 cluster. There are six topics in the Kafka cluster, each topic mirroring log messages from a remote Kafka cluster. Graylog has one GELF Kafka input configured per topic (i.e. 6 GELF Kafka inputs). Provided Graylog and Kafka are running, log messages are flowing smoothly.

Yesterday, I shut down the Graylog cluster for 10 minutes and then started it up again, testing whether Graylog would pick up where it left off with regard to consuming logs out of the Kafka cluster. Graylog has indexed about 40% of the log volume for the outage period compared to the volume indexed on either side of the outage.

In investigating this, I’m wondering how the Graylog Kafka input plugins keep track of Kafka message offsets in the topics. I used the ‘kafka-consumer-groups.sh’ tool to list all consumer groups known by Kafka, but ‘graylog2’ wasn’t listed. Can anyone tell me whether Graylog uses the Kafka group consumer offset mechanism to track the last processed message from a topic, or does it use an internally-recorded offset? Also, can anyone confirm whether Graylog should, by design, resume log ingestion from Kafka after a short outage such as I have instigated, assuming the logs have not been deleted from Kafka?

Many thanks

kay · September 4, 2018, 8:00am

It should pick up where it left off, yes. If it doesn’t do so that’s a bug, or a compatibility issue.
Graylog still uses the 0.9.0.1 client libraries (mostly because parts of Kafka’s journal are used internally and those haven’t been updated to the latest versions). I vaguely remember that Kafka 2.0 changed the wire protocol and wonder if this has anything to do with it, but I don’t have the relevant Kafka environments ready to confirm.

For recording the topic offset uses the standard client library mechanism, which I believe is storing the offsets in Zookeeper, at least for Kafka 0.9, but I haven’t been following its development too closely recently.

Hope that helps.

gwaugh · September 4, 2018, 10:54pm

Thanks Kay, I’ll look at the differences in Kafka versions with regard to this behaviour – looks like a good lead. From what I was reading yesterday, Kafka has moved the consumer offsets to be stored by default in an internal topic called ‘__consumer_offsets’ rather than Zookeeper.

Cheers

system · September 18, 2018, 10:54pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kafka input functionality Graylog Central (peer support)	1	526	April 10, 2019
Graylog Throughput / Metrics problem Graylog Central (peer support)	6	1634	January 22, 2019
Graylog delay index in a single topic Graylog Central (peer support)	3	613	May 8, 2019
Gelf kafka input load balancing Graylog Central (peer support)	5	2824	March 28, 2018
Issue while consuming message in graylog Graylog Central (peer support)	1	806	July 10, 2020

Kafka input plugin and topic offsets

Related topics