I have had a Graylog installation (stock Ubuntu package 2.2.3-1) running smootly for about half a year.
All of a sudden, at one morning all the inputs show they are not receiving any messages any more.
I have
1 global Syslog UDP input:
allow_override_date: true
bind_address: 192.168.100.217
expand_structured_data: true
force_rdns: false
override_source:
port: 15000
recv_buffer_size: 262144
store_full_message: trueand
1 GELF UDP input:
bind_address: 0.0.0.0
decompress_size_limit: 8388608
override_source:
port: 12201
recv_buffer_size: 262144
They both stopped receiving messages at about the same time (19th june, last message on one is 4:36AM, the other 4:40AM)
The only thing have done on 18th June was to change admin password in /etc/graylog/server/server.conf (and restart graylog of course), as I attempted to connect to graylog from a remote location and had forgotten it.
The web interface works fine, but no messages received on any input. And I have 1 remote syslog redirected there, as well as a specific python script, which writes to the GELF UDP input.
I tried stopping and starting inputs, recalculating index ranges and restarting the whole server, but nothing helps. There are no errors on System/Overview, the shard is green, no indexer failures or anything.
There is nothing that makes much sense for me in the logfiles for the period:
2018-06-18T10:55:19.346+03:00 INFO [SessionsResource] Invalid username or password for user "admin"
2018-06-18T20:00:56.936+03:00 WARN [transport] [graylog-e760c297-40e9-4283-b228-f643893c7bc1] Received response for a request that has timed out, sent [36720ms] ago, timed out [6720ms] ag
o, action [internal:discovery/zen/fd/master_ping], node [{main}{IgdQuOK2RUSpKdNgJu72pA}{127.0.0.1}{127.0.0.1:9300}], id [259047]
2018-06-19T11:53:40.778+03:00 WARN [ProxiedResource] Unable to call http://log.plg.lv:9000/api/system/inputstates on node <e760c297-40e9-4283-b228-f643893c7bc1>
java.net.SocketTimeoutException: timeout
at okio.Okio$4.newTimeoutException(Okio.java:227) ~[graylog.jar:?]
at okio.AsyncTimeout.exit(AsyncTimeout.java:284) ~[graylog.jar:?]
at okio.AsyncTimeout$2.read(AsyncTimeout.java:240) ~[graylog.jar:?]
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:325) ~[graylog.jar:?]
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:314) ~[graylog.jar:?]
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:210) ~[graylog.jar:?]
at okhttp3.internal.http1.Http1Codec.readResponse(Http1Codec.java:191) ~[graylog.jar:?]
at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:132) ~[graylog.jar:?]
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:54) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
at org.graylog2.rest.RemoteInterfaceProvider.lambda$get$0(RemoteInterfaceProvider.java:59) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:179) ~[graylog.jar:?]
at okhttp3.RealCall.execute(RealCall.java:63) ~[graylog.jar:?]
at retrofit2.OkHttpCall.execute(OkHttpCall.java:174) ~[graylog.jar:?]
at org.graylog2.shared.rest.resources.ProxiedResource.lambda$null$0(ProxiedResource.java:76) ~[graylog.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: java.net.SocketException: Socket closed
at java.net.SocketInputStream.read(SocketInputStream.java:204) ~[?:1.8.0_171]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_171]
at okio.Okio$2.read(Okio.java:138) ~[graylog.jar:?]
at okio.AsyncTimeout$2.read(AsyncTimeout.java:236) ~[graylog.jar:?]
... 29 more
2018-06-19T11:59:15.962+03:00 INFO [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now STOPPING
2018-06-19T11:59:16.318+03:00 INFO [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now STOPPED
2018-06-19T11:59:16.328+03:00 INFO [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now TERMINATED
2018-06-19T11:59:19.702+03:00 INFO [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now STARTING
2018-06-19T11:59:19.860+03:00 WARN [NettyTransport] receiveBufferSize (SO_RCVBUF) for input GELFUDPInput{title=Client file processor, type=org.graylog2.inputs.gelf.udp.GELFUDPInput, nodeId=e760c297-40e9-4283-b228-f643893c7bc1} should be 262144 but is 212992.
2018-06-19T11:59:19.863+03:00 INFO [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now RUNNING
2018-06-19T12:02:19.550+03:00 INFO [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now STOPPING
2018-06-19T12:02:19.551+03:00 INFO [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now STOPPED
2018-06-19T12:02:19.551+03:00 INFO [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now TERMINATED
2018-06-19T12:02:19.573+03:00 INFO [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now STARTING
2018-06-19T12:02:19.737+03:00 WARN [NettyTransport] receiveBufferSize (SO_RCVBUF) for input GELFUDPInput{title=Client file processor, type=org.graylog2.inputs.gelf.udp.GELFUDPInput, nodeId=e760c297-40e9-4283-b228-f643893c7bc1} should be 262144 but is 212992.
2018-06-19T12:02:19.739+03:00 INFO [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now RUNNING
2018-06-19T12:02:20.854+03:00 INFO [connection] Opened connection [connectionId{localValue:11, serverValue:110}] to localhost:27017
2018-06-19T12:04:27.925+03:00 INFO [RebuildIndexRangesJob] Recalculating index ranges.
2018-06-19T12:04:27.931+03:00 INFO [SystemJobManager] Submitted SystemJob <c51bfc20-739f-11e8-871d-005056a7f4a0> [org.graylog2.indexer.ranges.RebuildIndexRangesJob]
2018-06-19T12:04:28.118+03:00 INFO [RebuildIndexRangesJob] Recalculating index ranges for index set Default index set (graylog_*): 1 indices affected.
2018-06-19T12:04:28.158+03:00 INFO [RebuildIndexRangesJob] Done calculating index ranges for 1 indices. Took 99ms.
I can see all the usual LISTEN ports ARE open:
java 965 elasticsearch 89u IPv6 16785 0t0 TCP ip6-localhost:9300 (LISTEN)
java 965 elasticsearch 91u IPv6 16788 0t0 TCP localhost:9300 (LISTEN)
java 965 elasticsearch 101u IPv6 16846 0t0 TCP ip6-localhost:9200 (LISTEN)
java 965 elasticsearch 103u IPv6 16847 0t0 TCP localhost:9200 (LISTEN)
java 966 graylog 75u IPv6 17282 0t0 TCP ip6-localhost:9350 (LISTEN)
java 966 graylog 76u IPv6 17283 0t0 TCP localhost:9350 (LISTEN)
java 966 graylog 78u IPv6 17387 0t0 TCP log:9000 (LISTEN)
And tcpdump shows the log messages do get to the ethernet interface of the machine:
tcpdump -i ens160 src ftp.myserv.lv and port 15000
…
12:55:06.313437 IP (tos 0x0, ttl 64, id 5053, offset 0, flags [DF], proto UDP (17), length 144)
ftp.plg.lv.39167 > log.15000: UDP, length 116
E.....@.@.....d...d...:..|p.<94>1 2018-06-20T12:55:06+03:00 ftp vsftpd 13185 - [meta sequenceId="3918"] [vm] OK LOGIN: Client "213.175.117.147"
12:55:51.401032 IP (tos 0x0, ttl 64, id 33440, offset 0, flags [DF], proto TCP (6), length 60)
ftp.plg.lv.53254 > log.12002: Flags [S], cksum 0xb997 (correct), seq 125733094, win 29200, options [mss 1460,sackOK,TS val 934283846 ecr 0,nop,wscale 7], length 0
E..<..@.@.m...d...d......~........r............
I have no clue whatsoever, what could have gone wrong. Any ideas?