Gelf Http input ReadTimeoutException


(Igor Krug) #1

Hello,

I’m running graylog v3.0.0 and receiving messages on a HTTP Gelf input.
The messages are logged and I can find them in the search.

My problem is that for every second message the connection hangs and then timeouts with the following error:

ERROR [AbstractTcpTransport] Error in Input [GELF HTTP/5c73e434bc55cd3c3cad525c] (channel [id: 0xb89ad20a, L:/INTERNAL_IP:12201 ! R:/INTERNAL_IP:53490]) (cause io.netty.hanler.timeout.ReadTimeoutException)

Even with the error, the message is still logged, so I’m not missing any messages.

In front of the Graylog there is a Kong application doing authorization and reverse proxy.
Graylog and Kong are on the same machine and Kong forwards to local port 12201.
In front of the Kong there is CloudFlare.

The machine hosting the Kong and Graylog is a Amazon lightsail with 4GB RAM and no load on it.

Here is the tcpdump for the Kong port

13:54:47.054484 IP CLOUDFLARE_IP > ip-INTERNAL_IP.ec2.internal.http: Flags [P.], seq 613612924:613613604, ack 1315316155, win 29, length 680: HTTP: POST /gelf HTTP/1.1
E(..N.@.).T..DNG.......P$..|Nf!.P....}..POST /gelf HTTP/1.1
Host: the.host.com
Connection: Keep-Alive
Accept-Encoding: gzip
CF-IPCountry: BR
X-Forwarded-For: MY_PUBLIC_IP
CF-RAY: 4b664312a85d5e82-TPA
Content-Length: 252
X-Forwarded-Proto: https
CF-Visitor: {"scheme":"https"}
Authorization: Basic THE_AUTHORIZATION
Content-Type: application/json
CF-Connecting-IP: MY_PUBLIC_IP
CDN-Loop: cloudflare

{ "MessageType": "Something", "short_message": "Test", "host": "https://the.host.com", "version": "1.1" }

and tcpdump for 2 messages in the Graylog port

17:01:35.195601 IP ip-INTERNAL_IP.ec2.internal.54456 > ip-INTERNAL_IP.ec2.internal.12201: Flags [.], ack 1571059412, win 342, options [nop,nop,TS val 583205119 ecr 583205119], length 0
E..4l.@.@.k.........../...l.].v....Vb......
"..."...
17:01:35.195637 IP ip-INTERNAL_IP.ec2.internal.54456 > ip-INTERNAL_IP.ec2.internal.12201: Flags [P.], seq 0:848, ack 1, win 342, options [nop,nop,TS val 583205119 ecr 583205119], length 848
E...l.@.@.h.........../...l.].v....Vec.....
"..."...POST /gelf HTTP/1.1
Host: 172.26.4.220:12201
Connection: keep-alive
X-Forwarded-For: MY_PUBLIC_IP, CLOUDFLARE_IP
X-Forwarded-Proto: http
X-Forwarded-Host: the.host.com
X-Forwarded-Port: 80
X-Real-IP: CLOUDFLARE_IP
Content-Length: 252
Accept-Encoding: gzip
CF-IPCountry: BR
CF-RAY: 4b6754b58c365e82-TPA
CF-Visitor: {"scheme":"https"}
Content-Type: application/json
CF-Connecting-IP: MY_PUBLIC_IP
CDN-Loop: cloudflare
X-Consumer-ID: THE_GUID
X-Consumer-Custom-ID: something
X-Consumer-Username: something
X-Credential-Username: something

{ "MessageType": "Something", "short_message": "Test", "host": "https://the.host.com", "version": "1.1" }
17:01:35.198117 IP ip-INTERNAL_IP.ec2.internal.54456 > ip-INTERNAL_IP.ec2.internal.12201: Flags [.], ack 69, win 342, options [nop,nop,TS val 583205120 ecr 583205120], length 0
E..4l.@.@.k.........../...oi].w....Vb......
"..."...
17:01:37.943673 IP ip-INTERNAL_IP.ec2.internal.54456 > ip-INTERNAL_IP.ec2.internal.12201: Flags [P.], seq 848:1696, ack 69, win 342, options [nop,nop,TS val 583205806 ecr 583205120], length 848
E...l.@.@.h.........../...oi].w....Vec.....
"..."...POST /gelf HTTP/1.1
Host: 172.26.4.220:12201
Connection: keep-alive
X-Forwarded-For: MY_PUBLIC_IP, CLOUDFLARE_IP
X-Forwarded-Proto: http
X-Forwarded-Host: the.host.com
X-Forwarded-Port: 80
X-Real-IP: CLOUDFLARE_IP
Content-Length: 252
Accept-Encoding: gzip
CF-IPCountry: BR
CF-RAY: 4b6754c6cf8e5e88-TPA
CF-Visitor: {"scheme":"https"}
Content-Type: application/json
CF-Connecting-IP: MY_PUBLIC_IP
CDN-Loop: cloudflare
X-Consumer-ID: THE_GUID
X-Consumer-Custom-ID: something
X-Consumer-Username: something
X-Credential-Username: something

{ "MessageType": "Something", "short_message": "Test", "host": "https://the.host.com", "version": "1.1" }
17:01:47.944627 IP ip-INTERNAL_IP.ec2.internal.54456 > ip-INTERNAL_IP.ec2.internal.12201: Flags [F.], seq 1696, ack 70, win 342, options [nop,nop,TS val 583208306 ecr 583208306], length 0
E..4l.@.@.k.........../...r.].w....Vb......

The last 2 dates 17:01:37.943673 and 17:01:47.944627 are 10 seconds apart because I have set “Idle writer timeout” to 10 seconds in the Input, if this value is changed the time for the call number 2 to timeout also changes.


GELF HTTP input issue after upgrading to 3.0
(Igor Krug) #2

Just to clarify Content-Length: 252 isn’t correct because I have shortened the json to post it here.


(Jan Doberstein) #3

we have a known issue that will be fixed in 3.0.1 that will be released shortly

The fix: https://github.com/Graylog2/graylog2-server/pull/5728


(Nick Clark) #4

You say it will be released shortly, when is that exactly?


(Jan Doberstein) #5

speaking of days here.


(Nick Clark) #6

Is it possible to make these modifications yourself?


(Jan Doberstein) #7

if you want to compile Graylog yourself from the current development master - feel free todo this.

In this community are enough discussions how to do this.