Running graylog 3.1.5 on centos 7.6. The server.log is a constant stream of messages such as:
2020-09-25T12:06:17.030Z WARN [ProcessBufferProcessor] Unable to process message <83ebaf51-ff27-11ea-94fb-e4434b23219c>: java.lang.NullPointerException
I have done some debugging from the message source, which is fluentd in an openshift cluster going to a udp gelf input. I have identified some messages which are being sent by fluentd but not appearing in graylog, they look like so:
There are other messages with the exact same structure which are received correctly.
Would appreciate some pointers on the best way to debug this.
Thank you for your quick response.
I have been using tcpdump to examine the gelf stream, this is how I was able to confirm that fluentd was sending all messages. I suspect the messages that do not make it to graylog are chunked, and if I enable debug logging I see messages like so:
2020-09-25T12:33:27.239Z DEBUG [GelfChunkAggregator] Dumping GELF chunk map [chunks for 17 messages]:
Message <6df71a43ed10824a> Chunks:
ID: 6df71a43ed10824a Sequence: 2/2 Arrival: 1601037203527 Data size: 96
ID: e6d59d38ca93600f Sequence: 1/2 Arrival: 1601037205244 Data size: 1432
Message <086e7973d1821bb4> Chunks:
ID: 086e7973d1821bb4 Sequence: 2/2 Arrival: 1601037203528 Data size: 96
2020-09-25T12:33:27.230Z DEBUG [EnvelopeMessageAggregationHandler] Message aggregation completion, forwarding UnpooledHeapByteBuf(ridx: 0, widx: 926, cap: 926/926)
2020-09-25T12:33:27.230Z DEBUG [EnvelopeMessageAggregationHandler] More chunks necessary to complete this message
Is this normal?
Regarding the level field, you helped me a number of months ago with that very issue, and I have a pipeline which rewrites the string to an integer, this has been working well and I am reasonably confident is not an issue here.
Thank you shoothub, I will look at that.
I believe the exceptions are caused by failed geoip lookups for rfc1918 and localhost addresses.
You can exclude RFC1918 addresses using pipeline condition:
- Use pipeline function
cidr_match(“192.168.0.0/16”, to_ip($message.src_ip)) ||
cidr_match(“172.16.0.0/12”, to_ip($message.src_ip)) ||
2. If you enable threatintel plugin you can use new pipeline function
Awesome, thank you once again shoothub.
To follow up on this, we found our missing messages. The issue seems to have been that we use round robin dns to share load between multiple graylog hosts, fluentd was sending some chunks to one host, and some to another, so the messages were never reassembled. Pointing fluentd at a specific host, while a single point of failure, has alleviated the issue.
Thanks again for help.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.