[ProcessBufferProcessor] Unable to process message java.lang.NullPointerException

petemc · September 25, 2020, 12:11pm

Hi.

Running graylog 3.1.5 on centos 7.6. The server.log is a constant stream of messages such as:

2020-09-25T12:06:17.030Z WARN [ProcessBufferProcessor] Unable to process message <83ebaf51-ff27-11ea-94fb-e4434b23219c>: java.lang.NullPointerException

I have done some debugging from the message source, which is fluentd in an openshift cluster going to a udp gelf input. I have identified some messages which are being sent by fluentd but not appearing in graylog, they look like so:

{“thread”:“scala-execution-context-global-80133”,“level”:“INFO”,“loggerName”:“xxx.xxx.xxx.api.druid.DruidClient”,“message”:“Message String”,“endOfBatch”:false,“loggerFqcn”:“org.apache.logging.slf4j.Log4jLogger”,“instant”:{“epochSecond”:1600721427,“nanoOfSecond”:387000000},“threadId”:80133,“threadPriority”:5}

There are other messages with the exact same structure which are received correctly.

Would appreciate some pointers on the best way to debug this.

Cheers

shoothub · September 25, 2020, 12:29pm

Please check, if error message is not followed by more detail.
Try to increase log level in graylog in System - Logging, change from Info to Debug and check graylog log file sudo tail -f /var/log/graylog-server/server.log
Try do capture packets using tcpdump in graylog server:
tcpdump -i INTERFACE -vnnA 'udp port 12201'
Or capture to pcap file and analyze using Wireshark
tcpdump -i INTERFACE -w output.pcap ‘udp port 12201’
Field level could create problem in graylog, GELF requires that level is numeric field, not String
https://docs.graylog.org/en/3.1/pages/gelf.html

petemc · September 25, 2020, 12:39pm

Hi shoothub.

Thank you for your quick response.

I have been using tcpdump to examine the gelf stream, this is how I was able to confirm that fluentd was sending all messages. I suspect the messages that do not make it to graylog are chunked, and if I enable debug logging I see messages like so:

2020-09-25T12:33:27.239Z DEBUG [GelfChunkAggregator] Dumping GELF chunk map [chunks for 17 messages]:
Message <6df71a43ed10824a> Chunks:

ID: 6df71a43ed10824a Sequence: 2/2 Arrival: 1601037203527 Data size: 96
Message Chunks:
ID: e6d59d38ca93600f Sequence: 1/2 Arrival: 1601037205244 Data size: 1432

Message <086e7973d1821bb4> Chunks:

ID: 086e7973d1821bb4 Sequence: 2/2 Arrival: 1601037203528 Data size: 96
…
2020-09-25T12:33:27.230Z DEBUG [EnvelopeMessageAggregationHandler] Message aggregation completion, forwarding UnpooledHeapByteBuf(ridx: 0, widx: 926, cap: 926/926)
2020-09-25T12:33:27.230Z DEBUG [EnvelopeMessageAggregationHandler] More chunks necessary to complete this message

Is this normal?

Regarding the level field, you helped me a number of months ago with that very issue, and I have a pipeline which rewrites the string to an integer, this has been working well and I am reasonably confident is not an issue here.

Cheers

shoothub · September 25, 2020, 1:05pm

Try to play with chunk parameters in fluent if it helps:
https://docs.fluentd.org/configuration/buffer-section#buffering-parameters

petemc · September 25, 2020, 1:06pm

Thank you shoothub, I will look at that.

I believe the exceptions are caused by failed geoip lookups for rfc1918 and localhost addresses.

Cheers

shoothub · September 25, 2020, 1:25pm

You can exclude RFC1918 addresses using pipeline condition:

Use pipeline function cidr_match()

when
cidr_match(“192.168.0.0/16”, to_ip($message.src_ip)) ||
cidr_match(“172.16.0.0/12”, to_ip($message.src_ip)) ||
cidr_match(“10.0.0.0/8”, to_ip($message.src_ip))
then
set_field(“internal_ip”, true);
end

2. If you enable threatintel plugin you can use new pipeline function in_private_net()

when
in_private_net(to_string($message.src_ip))
then
set_field("internal_ip", true);
end

petemc · September 25, 2020, 1:53pm

Awesome, thank you once again shoothub.

petemc · October 6, 2020, 2:37pm

Hi.

To follow up on this, we found our missing messages. The issue seems to have been that we use round robin dns to share load between multiple graylog hosts, fluentd was sending some chunks to one host, and some to another, so the messages were never reassembled. Pointing fluentd at a specific host, while a single point of failure, has alleviated the issue.

Thanks again for help.

system · October 20, 2020, 2:37pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog 3.1 CEF UDP Checkpoint Graylog Central (peer support)	6	1664	December 22, 2019
Problem with gelf tcp input Graylog Central (peer support)	2	689	August 20, 2019
Graylog server with full journal Graylog Central (peer support)	1	685	October 8, 2020
Another fun problem: all processbuffers remain empty Graylog Central (peer support)	9	665	September 20, 2022
Graylog stops processing messages Graylog Central (peer support)	2	865	March 9, 2020

[ProcessBufferProcessor] Unable to process message java.lang.NullPointerException

Related topics