I have a small three-node cluster set up in Kubernetes through Helm chart (kongz/graylog --version 1.7.12 with Graylog Docker image 4.2.5) with EasticSearch (v7) and MongoDB (v4) are also brought up with Helm.
The number of out is less than In most of the time (sometimes there is in but no our on one of the nodes). None of the nodes is really busy (low CPU load, empty buffers), and no capacity pressure is observed on ES. Some messages are kept in the disk journal but never grow too much. Sometimes bursts of output may happen.
In the log, there are many error messages like “ERROR [DecodingProcessor] - Unable to decode raw message RawMessage…”
What should be the cause? Any suggestion where should I look into the issue?
Ummmm, are you referring to something like this error?
ERROR [DecodingProcessor] Unable to decode raw message RawMessage{id=c657dcf5-8087-11ec-b076-00155d601d11, messageQueueId=1136478 9158, codec=gelf, payloadSize=205, timestamp=2022-01-28T22:15:17.823Z, remoteAddress=/10.10.10.10:36290} on input <5e265ada83d72ec570ab5fe2>.
2022-01-28T16:15:17.839-06:00 ERROR [DecodingProcessor] Error processing message RawMessage{id=c657dcf5-8087-11ec-b076-00155d601d11, messageQueueId=11364789158 , codec=gelf, payloadSize=205, timestamp=2022-01-28T22:15:17.823Z, remoteAddress=/10.10.10.10:36290}
java.lang.IllegalArgumentException: GELF message (received from <10.10.10.10:36290>) has empty mandatory “short_message” field.
at org.graylog2.inputs.codecs.GelfCodec.validateGELFMessage(GelfCodec.java:263) ~[graylog.jar:?]
at org.graylog2.inputs.codecs.GelfCodec.decode(GelfCodec.java:141) ~[graylog.jar:?]
at org.graylog2.shared.buffers.processors.DecodingProcessor.processMessage(DecodingProcessor.java:153) ~[graylog.jar:?]
at org.graylog2.shared.buffers.processors.DecodingProcessor.onEvent(DecodingProcessor.java:94) [graylog.jar:?]
at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:95) [graylog.jar:?]
at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:49) [graylog.jar:?]
at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [graylog.jar:?]
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312]
If so I would need to see the whole message to see what exactly happened, but in short I produce this message above from my remote log shipper ( Nxlog) to Graylog. What I did was let nxlog grab a file that could not be processed with GELF input type. The type of audit logs needed to be in a different format then shipped to the proper INPUT type.
I believe you can ,BUT in the long run you should have different ones, something like this…
Syslog TCP 5140
Syslog UDP 5141
I have in the past used the same ports but ran into more problems with port conflictions.
So I have made a chart for Graylog Port Reservations. I no longer use GL default ports for production use.
Hope that helps
Someone else replaced Syslog TCP input with GELF TCP input on the port I initially set up.
Now I create a new Syslog TCP input on another port and direct the Syslog TCP traffic to the new port and all seems good now.