Running graylog 3.1.5 on centos 7.6. The server.log is a constant stream of messages such as:
2020-09-25T12:06:17.030Z WARN [ProcessBufferProcessor] Unable to process message <83ebaf51-ff27-11ea-94fb-e4434b23219c>: java.lang.NullPointerException
I have done some debugging from the message source, which is fluentd in an openshift cluster going to a udp gelf input. I have identified some messages which are being sent by fluentd but not appearing in graylog, they look like so:
Please check, if error message is not followed by more detail.
Try to increase log level in graylog in System - Logging, change from Info to Debug and check graylog log file sudo tail -f /var/log/graylog-server/server.log
Try do capture packets using tcpdump in graylog server: tcpdump -i INTERFACE -vnnA 'udp port 12201'
Or capture to pcap file and analyze using Wireshark tcpdump -i INTERFACE -w output.pcap ‘udp port 12201’
I have been using tcpdump to examine the gelf stream, this is how I was able to confirm that fluentd was sending all messages. I suspect the messages that do not make it to graylog are chunked, and if I enable debug logging I see messages like so:
ID: 6df71a43ed10824a Sequence: 2/2 Arrival: 1601037203527 Data size: 96
Message Chunks:
ID: e6d59d38ca93600f Sequence: 1/2 Arrival: 1601037205244 Data size: 1432
Message <086e7973d1821bb4> Chunks:
ID: 086e7973d1821bb4 Sequence: 2/2 Arrival: 1601037203528 Data size: 96
…
2020-09-25T12:33:27.230Z DEBUG [EnvelopeMessageAggregationHandler] Message aggregation completion, forwarding UnpooledHeapByteBuf(ridx: 0, widx: 926, cap: 926/926)
2020-09-25T12:33:27.230Z DEBUG [EnvelopeMessageAggregationHandler] More chunks necessary to complete this message
Is this normal?
Regarding the level field, you helped me a number of months ago with that very issue, and I have a pipeline which rewrites the string to an integer, this has been working well and I am reasonably confident is not an issue here.
You can exclude RFC1918 addresses using pipeline condition:
Use pipeline function cidr_match()
when
cidr_match(“192.168.0.0/16”, to_ip($message.src_ip)) ||
cidr_match(“172.16.0.0/12”, to_ip($message.src_ip)) ||
cidr_match(“10.0.0.0/8”, to_ip($message.src_ip))
then
set_field(“internal_ip”, true);
end
2. If you enable threatintel plugin you can use new pipeline function in_private_net()
when
in_private_net(to_string($message.src_ip))
then
set_field("internal_ip", true);
end
To follow up on this, we found our missing messages. The issue seems to have been that we use round robin dns to share load between multiple graylog hosts, fluentd was sending some chunks to one host, and some to another, so the messages were never reassembled. Pointing fluentd at a specific host, while a single point of failure, has alleviated the issue.