[ProcessBufferProcessor] Unable to process message java.lang.NullPointerException

Hi.

Running graylog 3.1.5 on centos 7.6. The server.log is a constant stream of messages such as:

2020-09-25T12:06:17.030Z WARN [ProcessBufferProcessor] Unable to process message <83ebaf51-ff27-11ea-94fb-e4434b23219c>: java.lang.NullPointerException

I have done some debugging from the message source, which is fluentd in an openshift cluster going to a udp gelf input. I have identified some messages which are being sent by fluentd but not appearing in graylog, they look like so:

{“thread”:“scala-execution-context-global-80133”,“level”:“INFO”,“loggerName”:“xxx.xxx.xxx.api.druid.DruidClient”,“message”:“Message String”,“endOfBatch”:false,“loggerFqcn”:“org.apache.logging.slf4j.Log4jLogger”,“instant”:{“epochSecond”:1600721427,“nanoOfSecond”:387000000},“threadId”:80133,“threadPriority”:5}

There are other messages with the exact same structure which are received correctly.

Would appreciate some pointers on the best way to debug this.

Cheers

  1. Please check, if error message is not followed by more detail.
  2. Try to increase log level in graylog in System - Logging, change from Info to Debug and check graylog log file sudo tail -f /var/log/graylog-server/server.log
  3. Try do capture packets using tcpdump in graylog server:
    tcpdump -i INTERFACE -vnnA 'udp port 12201'
    Or capture to pcap file and analyze using Wireshark
    tcpdump -i INTERFACE -w output.pcap ‘udp port 12201’
  4. Field level could create problem in graylog, GELF requires that level is numeric field, not String
    https://docs.graylog.org/en/3.1/pages/gelf.html

Hi shoothub.

Thank you for your quick response.

I have been using tcpdump to examine the gelf stream, this is how I was able to confirm that fluentd was sending all messages. I suspect the messages that do not make it to graylog are chunked, and if I enable debug logging I see messages like so:

2020-09-25T12:33:27.239Z DEBUG [GelfChunkAggregator] Dumping GELF chunk map [chunks for 17 messages]:
Message <6df71a43ed10824a> Chunks:

ID: 6df71a43ed10824a Sequence: 2/2 Arrival: 1601037203527 Data size: 96
Message Chunks:
ID: e6d59d38ca93600f Sequence: 1/2 Arrival: 1601037205244 Data size: 1432

Message <086e7973d1821bb4> Chunks:

ID: 086e7973d1821bb4 Sequence: 2/2 Arrival: 1601037203528 Data size: 96

2020-09-25T12:33:27.230Z DEBUG [EnvelopeMessageAggregationHandler] Message aggregation completion, forwarding UnpooledHeapByteBuf(ridx: 0, widx: 926, cap: 926/926)
2020-09-25T12:33:27.230Z DEBUG [EnvelopeMessageAggregationHandler] More chunks necessary to complete this message

Is this normal?

Regarding the level field, you helped me a number of months ago with that very issue, and I have a pipeline which rewrites the string to an integer, this has been working well and I am reasonably confident is not an issue here.

Cheers

Try to play with chunk parameters in fluent if it helps:
https://docs.fluentd.org/configuration/buffer-section#buffering-parameters

Thank you shoothub, I will look at that.

I believe the exceptions are caused by failed geoip lookups for rfc1918 and localhost addresses.

Cheers

You can exclude RFC1918 addresses using pipeline condition:

  1. Use pipeline function cidr_match()
when
cidr_match(“192.168.0.0/16”, to_ip($message.src_ip)) ||
cidr_match(“172.16.0.0/12”, to_ip($message.src_ip)) ||
cidr_match(“10.0.0.0/8”, to_ip($message.src_ip))
then
set_field(“internal_ip”, true);
end

2. If you enable threatintel plugin you can use new pipeline function in_private_net()
when
in_private_net(to_string($message.src_ip))
then
set_field("internal_ip", true);
end

Awesome, thank you once again shoothub.

Hi.

To follow up on this, we found our missing messages. The issue seems to have been that we use round robin dns to share load between multiple graylog hosts, fluentd was sending some chunks to one host, and some to another, so the messages were never reassembled. Pointing fluentd at a specific host, while a single point of failure, has alleviated the issue.

Thanks again for help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.