Graylog 3.1 Input Problem

Well, I have an issue on a new install of Graylog 3.1.(Ubuntu Server 18.04 LTS) I am inputting a single source (Syslog TCP) and it seems to be constantly resetting the connection. After 20-30 minutes of it doing that, the input goes completely non responsive and then just floods the server.log with the following:

2019-12-02T12:22:55.744-06:00 ERROR [AbstractTcpTransport] Error in Input [Syslog TCP/5dd6e9e6513b5f1b3607ccf9] (channel [id: 0xc631e548, L:/10.0.0.3:5555 ! R:/198.232.231.1:10874]) (cause io.netty.util.internal.OutOfDirectMemoryError: f
ailed to allocate 16777216 byte(s) of direct memory (used: 1010827264, max: 1020067840))

Restarting the process fixes the issue for the next 20-30 minutes until it happens again. Whatever is out of memory is not a physical out of memory as while in this state I show 3.39G used with 13G free.

It looks like all the dropped connections is the problem but I have a second rsyslog server taking the same stream from the same device (tcp) without a single issue.

Just not sure where to start looking, I am new to Graylog and am still learning. Help!

Well, I do see something here. When I get the following in my logs, it seems to kill the connection and then reset it, I get a subtraction of “Active Connections”, the (XXX) counter besides that increases, then I get a new active connection. The above happens every time I get a message like this:

2019-12-02T14:08:15.659-06:00 ERROR [AbstractTcpTransport] Error in Input [Syslog TCP/5dd6e9e6513b5f1b3607ccf9] (channel [id: 0x590d2bee, L:/10.0.0.3:5555 ! R:/198.232.231.1:17218]) (cause io.netty.handler.codec.DecoderException: java.lang.NumberFormatException: For input string: “Query”)

Also, this syslog is off a Juniper Firewall if that helps any.

Found the issue. I found yet another bug in our Juniper firewall where it just stops putting line breaks for each syslog message and just sends big junked messages out with multiple messages lumped together and then truncated when it reaches maximum message size. The error I am seeing is when it hits the cut off part of the message.

So I guess it gets a bad message, resets the connection, and then does that over and over again before the error in the first post?

Just trying to understand what I am seeing. Thanks!

thank you @ataylor

that might help others to identify their problem!

I guess that leaves me with the question of why does the OutOfDirectMemoryError happen? A service restart (not just stopping/starting the input) is the only way to fix that issue once it happens. Or better yet, why does a DecoderException error kill and restart the connection each time?

As I said, trying to wrap my mind around what is going on. Thanks!

I had another question, why do you allow 16MB message size?
The default size is 2 MB, and after 2MB, you just get an error, it is oversized… without any problems.
Maybe a kernel parameter, ulimit limits your memory request. I’m not sure.