I have a setup whereby log messages are pushed by filebeat 5.5.1 running on CentOS 6/7 servers to a Beats input on a 3-node Graylog 2.4.3 cluster. An output (org.graylog.plugins.kafka.KafkaOutput from https://github.com/fbalicchia/graylog-plugin-kafka-inout) on the ‘All messages’ stream on the 3-node cluster sends the messages to a local kafka cluster, and are then consumed by a remote kafka cluster, to be consumed by a 6-node Graylog 2.4.3 cluster, using a Syslog Kafka input.
On the 3-node cluster end, the fields are extracted successfully into ‘source’, ‘timestamp’, ‘message’, ‘facility’, ‘file’, ‘type’ etc. The messages are received successfully at the 6-node end, but the ‘source’ is displayed by Graylog as ‘unknown’, the ‘facility’ is ‘Unknown’ and other beats-related fields don’t exist. I can see that the messages are sent into kafka by the aforementioned output on the 3-node Graylog cluster side in plain text, so this makes sense.
I want the messages in the 6-node Graylog cluster to be in the same format as in the 3-node cluster, i.e. with ‘source’ containing the hostname of the machine generating the message, together with the beats-related fields etc. Would it work if the kafka output plugin on the 3-node cluster side had the option of outputting the messages to kafka in JSON format, with all fields intact? Any other potential solutions?