We are going to have a graylog cluster on AWS, and I have a question about embedded kafka services how does it handle messages between each node? have to plan a configuration? or it is automatically managed by graylog
Graylog is using the disk journal implementation of Kafka internally (the “Log”) and it provides inputs, which can be used to consume messages from Kafka brokers, but it doesn’t provide any embedded Kafka broker.
Yes it is: “implementation of Kafka Internally in Graylog”, can you confirm that it is run on each node of the cluster and that the writing of the log is done locally on each node in the directory specified in the file of configuration “graylog.conf” by the variable: “message_journal_dir =”?
I use logstash to send my server logs in graylog (GELF / UDP) and I wanted to know how are distributed the messages on the different node of the cluster graylog does the distribution of the messages automatically?
Graylog writes incoming messages into the disk journal (which can be configured with the message_journal_dir setting) immediately after they’ve been received and before they are further processed (extractors, pipeline rules, etc.).
Graylog doesn’t automatically distribute messages (or message fragments) to different nodes in the Graylog cluster. Messages are always processed on the node which received them.
In case of GELF UDP, you have to make sure that all message chunks are received by the same Graylog node. This is important if you’re using a UDP load-balancer which is not aware of the GELF UDP protocol.
So to ensure the distribution of the messages I have to put in front of the Graylog cluster a loadbalancer that intercepts the UDP / GELF requests transmitted by logstatsh and which transfers them to the different node of the cluster graylog?
@jochen: hi Jochen seen that it is a kafka implementation that runs and writes directly to the local disk if we have a loss of the node / pod (docker), we lose the current operations.
is there a solution? ie: is there a way to share the processing directory between the different nodes?
So if I understand your solution with an external Kafka/RabbitMQ, Graylog will trigger a job when it sees a message in the queue, process the work to do, and only then clear the message in the queue.
This is what would be optimal for me/us.
However, I am wondering where I can notify Graylog that I want to use an extrenal message broker, I can’t find it in the doc.
Moreover, I think it could be a good thing that you have a vision of the overall project, that is to un Graylog on top of Kubernetes, with HA and no loss of data/work. That is whhy we are looing for a way to make Graylog stateless.