Cluster Graylog and embedded Kafka

Hello everyone,

We are going to have a graylog cluster on AWS, and I have a question about embedded kafka services how does it handle messages between each node? have to plan a configuration? or it is automatically managed by graylog

Thanks for your help

Best regards

Which embedded Kafka service do you mean?

Graylog is using the disk journal implementation of Kafka internally (the “Log”) and it provides inputs, which can be used to consume messages from Kafka brokers, but it doesn’t provide any embedded Kafka broker.

thx you very much Jochen for your answer

Yes it is: “implementation of Kafka Internally in Graylog”, can you confirm that it is run on each node of the cluster and that the writing of the log is done locally on each node in the directory specified in the file of configuration “graylog.conf” by the variable: “message_journal_dir =”?

I use logstash to send my server logs in graylog (GELF / UDP) and I wanted to know how are distributed the messages on the different node of the cluster graylog does the distribution of the messages automatically?

Graylog writes incoming messages into the disk journal (which can be configured with the message_journal_dir setting) immediately after they’ve been received and before they are further processed (extractors, pipeline rules, etc.).

Graylog doesn’t automatically distribute messages (or message fragments) to different nodes in the Graylog cluster. Messages are always processed on the node which received them.

In case of GELF UDP, you have to make sure that all message chunks are received by the same Graylog node. This is important if you’re using a UDP load-balancer which is not aware of the GELF UDP protocol.

Thx you Jochen

So to ensure the distribution of the messages I have to put in front of the Graylog cluster a loadbalancer that intercepts the UDP / GELF requests transmitted by logstatsh and which transfers them to the different node of the cluster graylog?

I’d recommend using GELF TCP when you want to deploy a load balancer in front of Graylog.

1 Like

@jochen: hi Jochen seen that it is a kafka implementation that runs and writes directly to the local disk if we have a loss of the node / pod (docker), we lose the current operations.
is there a solution? ie: is there a way to share the processing directory between the different nodes?

It’s a disk journal, not a full-fledged messaging broker.

You could write your logs into an external message broker such as RabbitMQ or Kafka and let Graylog pull messages from there.

No, the disk journal cannot be shared between nodes.

1 Like

So if I go through Kafka or RabbitMQ, graylog does not write on the disc, is that it?

Hi,

So if I understand your solution with an external Kafka/RabbitMQ, Graylog will trigger a job when it sees a message in the queue, process the work to do, and only then clear the message in the queue.
This is what would be optimal for me/us.

However, I am wondering where I can notify Graylog that I want to use an extrenal message broker, I can’t find it in the doc.
Moreover, I think it could be a good thing that you have a vision of the overall project, that is to un Graylog on top of Kubernetes, with HA and no loss of data/work. That is whhy we are looing for a way to make Graylog stateless.

Graylog comes with Kafka and RabbitMQ inputs which you can use to read messages from a message broker.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.