Kafka input plugin and topic offsets

Hi all,

I have a 6-node Graylog 2.4.3 cluster pulling logs out of a 3-node Kafka 2.0.0 cluster. There are six topics in the Kafka cluster, each topic mirroring log messages from a remote Kafka cluster. Graylog has one GELF Kafka input configured per topic (i.e. 6 GELF Kafka inputs). Provided Graylog and Kafka are running, log messages are flowing smoothly.

Yesterday, I shut down the Graylog cluster for 10 minutes and then started it up again, testing whether Graylog would pick up where it left off with regard to consuming logs out of the Kafka cluster. Graylog has indexed about 40% of the log volume for the outage period compared to the volume indexed on either side of the outage.

In investigating this, I’m wondering how the Graylog Kafka input plugins keep track of Kafka message offsets in the topics. I used the ‘kafka-consumer-groups.sh’ tool to list all consumer groups known by Kafka, but ‘graylog2’ wasn’t listed. Can anyone tell me whether Graylog uses the Kafka group consumer offset mechanism to track the last processed message from a topic, or does it use an internally-recorded offset? Also, can anyone confirm whether Graylog should, by design, resume log ingestion from Kafka after a short outage such as I have instigated, assuming the logs have not been deleted from Kafka?

Many thanks :smile:

It should pick up where it left off, yes. If it doesn’t do so that’s a bug, or a compatibility issue.
Graylog still uses the 0.9.0.1 client libraries (mostly because parts of Kafka’s journal are used internally and those haven’t been updated to the latest versions). I vaguely remember that Kafka 2.0 changed the wire protocol and wonder if this has anything to do with it, but I don’t have the relevant Kafka environments ready to confirm.

For recording the topic offset uses the standard client library mechanism, which I believe is storing the offsets in Zookeeper, at least for Kafka 0.9, but I haven’t been following its development too closely recently.

Hope that helps.

Thanks Kay, I’ll look at the differences in Kafka versions with regard to this behaviour – looks like a good lead. From what I was reading yesterday, Kafka has moved the consumer offsets to be stored by default in an internal topic called ‘__consumer_offsets’ rather than Zookeeper.

Cheers

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.