Suddenly Graylog inputs stop receiving messages?


(Gnudiff) #1

I have had a Graylog installation (stock Ubuntu package 2.2.3-1) running smootly for about half a year.
All of a sudden, at one morning all the inputs show they are not receiving any messages any more.

I have
1 global Syslog UDP input:
allow_override_date: true
bind_address: 192.168.100.217
expand_structured_data: true
force_rdns: false
override_source:
port: 15000
recv_buffer_size: 262144
store_full_message: true

and
1 GELF UDP input:
bind_address: 0.0.0.0
decompress_size_limit: 8388608
override_source:
port: 12201
recv_buffer_size: 262144

They both stopped receiving messages at about the same time (19th june, last message on one is 4:36AM, the other 4:40AM)

The only thing have done on 18th June was to change admin password in /etc/graylog/server/server.conf (and restart graylog of course), as I attempted to connect to graylog from a remote location and had forgotten it.

The web interface works fine, but no messages received on any input. And I have 1 remote syslog redirected there, as well as a specific python script, which writes to the GELF UDP input.

I tried stopping and starting inputs, recalculating index ranges and restarting the whole server, but nothing helps. There are no errors on System/Overview, the shard is green, no indexer failures or anything.

There is nothing that makes much sense for me in the logfiles for the period:

2018-06-18T10:55:19.346+03:00 INFO  [SessionsResource] Invalid username or password for user "admin"
2018-06-18T20:00:56.936+03:00 WARN  [transport] [graylog-e760c297-40e9-4283-b228-f643893c7bc1] Received response for a request that has timed out, sent [36720ms] ago, timed out [6720ms] ag
o, action [internal:discovery/zen/fd/master_ping], node [{main}{IgdQuOK2RUSpKdNgJu72pA}{127.0.0.1}{127.0.0.1:9300}], id [259047]
2018-06-19T11:53:40.778+03:00 WARN  [ProxiedResource] Unable to call http://log.plg.lv:9000/api/system/inputstates on node <e760c297-40e9-4283-b228-f643893c7bc1>
java.net.SocketTimeoutException: timeout
        at okio.Okio$4.newTimeoutException(Okio.java:227) ~[graylog.jar:?]
        at okio.AsyncTimeout.exit(AsyncTimeout.java:284) ~[graylog.jar:?]
        at okio.AsyncTimeout$2.read(AsyncTimeout.java:240) ~[graylog.jar:?]
        at okio.RealBufferedSource.indexOf(RealBufferedSource.java:325) ~[graylog.jar:?]
        at okio.RealBufferedSource.indexOf(RealBufferedSource.java:314) ~[graylog.jar:?]
        at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:210) ~[graylog.jar:?]
        at okhttp3.internal.http1.Http1Codec.readResponse(Http1Codec.java:191) ~[graylog.jar:?]
        at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:132) ~[graylog.jar:?]
        at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:54) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
        at org.graylog2.rest.RemoteInterfaceProvider.lambda$get$0(RemoteInterfaceProvider.java:59) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
        at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:179) ~[graylog.jar:?]
        at okhttp3.RealCall.execute(RealCall.java:63) ~[graylog.jar:?]
        at retrofit2.OkHttpCall.execute(OkHttpCall.java:174) ~[graylog.jar:?]
        at org.graylog2.shared.rest.resources.ProxiedResource.lambda$null$0(ProxiedResource.java:76) ~[graylog.jar:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: java.net.SocketException: Socket closed
        at java.net.SocketInputStream.read(SocketInputStream.java:204) ~[?:1.8.0_171]
        at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_171]
        at okio.Okio$2.read(Okio.java:138) ~[graylog.jar:?]
        at okio.AsyncTimeout$2.read(AsyncTimeout.java:236) ~[graylog.jar:?]
        ... 29 more
2018-06-19T11:59:15.962+03:00 INFO  [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now STOPPING
2018-06-19T11:59:16.318+03:00 INFO  [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now STOPPED
2018-06-19T11:59:16.328+03:00 INFO  [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now TERMINATED
2018-06-19T11:59:19.702+03:00 INFO  [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now STARTING
2018-06-19T11:59:19.860+03:00 WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input GELFUDPInput{title=Client file processor, type=org.graylog2.inputs.gelf.udp.GELFUDPInput, nodeId=e760c297-40e9-4283-b228-f643893c7bc1} should be 262144 but is 212992.
2018-06-19T11:59:19.863+03:00 INFO  [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now RUNNING
2018-06-19T12:02:19.550+03:00 INFO  [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now STOPPING
2018-06-19T12:02:19.551+03:00 INFO  [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now STOPPED
2018-06-19T12:02:19.551+03:00 INFO  [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now TERMINATED
2018-06-19T12:02:19.573+03:00 INFO  [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now STARTING
2018-06-19T12:02:19.737+03:00 WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input GELFUDPInput{title=Client file processor, type=org.graylog2.inputs.gelf.udp.GELFUDPInput, nodeId=e760c297-40e9-4283-b228-f643893c7bc1} should be 262144 but is 212992.
2018-06-19T12:02:19.739+03:00 INFO  [InputStateListener] Input [GELF UDP/5afeb80c0541a513ff14d2f3] is now RUNNING
2018-06-19T12:02:20.854+03:00 INFO  [connection] Opened connection [connectionId{localValue:11, serverValue:110}] to localhost:27017
2018-06-19T12:04:27.925+03:00 INFO  [RebuildIndexRangesJob] Recalculating index ranges.
2018-06-19T12:04:27.931+03:00 INFO  [SystemJobManager] Submitted SystemJob <c51bfc20-739f-11e8-871d-005056a7f4a0> [org.graylog2.indexer.ranges.RebuildIndexRangesJob]
2018-06-19T12:04:28.118+03:00 INFO  [RebuildIndexRangesJob] Recalculating index ranges for index set Default index set (graylog_*): 1 indices affected.
2018-06-19T12:04:28.158+03:00 INFO  [RebuildIndexRangesJob] Done calculating index ranges for 1 indices. Took 99ms.

I can see all the usual LISTEN ports ARE open:

java       965 elasticsearch   89u  IPv6  16785      0t0  TCP ip6-localhost:9300 (LISTEN)
java       965 elasticsearch   91u  IPv6  16788      0t0  TCP localhost:9300 (LISTEN)
java       965 elasticsearch  101u  IPv6  16846      0t0  TCP ip6-localhost:9200 (LISTEN)
java       965 elasticsearch  103u  IPv6  16847      0t0  TCP localhost:9200 (LISTEN)
java       966       graylog   75u  IPv6  17282      0t0  TCP ip6-localhost:9350 (LISTEN)
java       966       graylog   76u  IPv6  17283      0t0  TCP localhost:9350 (LISTEN)
java       966       graylog   78u  IPv6  17387      0t0  TCP log:9000 (LISTEN)

And tcpdump shows the log messages do get to the ethernet interface of the machine:

tcpdump -i ens160 src ftp.myserv.lv and port 15000

12:55:06.313437 IP (tos 0x0, ttl 64, id 5053, offset 0, flags [DF], proto UDP (17), length 144)
    ftp.plg.lv.39167 > log.15000: UDP, length 116
E.....@.@.....d...d...:..|p.<94>1 2018-06-20T12:55:06+03:00 ftp vsftpd 13185 - [meta sequenceId="3918"] [vm] OK LOGIN: Client "213.175.117.147"

12:55:51.401032 IP (tos 0x0, ttl 64, id 33440, offset 0, flags [DF], proto TCP (6), length 60)
    ftp.plg.lv.53254 > log.12002: Flags [S], cksum 0xb997 (correct), seq 125733094, win 29200, options [mss 1460,sackOK,TS val 934283846 ecr 0,nop,wscale 7], length 0
E..<..@.@.m...d...d......~........r............

I have no clue whatsoever, what could have gone wrong. Any ideas?


(Jochen) #2

Please post the complete configuration and the complete logs of your Graylog and Elasticsearch nodes.
:arrow_right: http://docs.graylog.org/en/2.4/pages/configuration/file_location.html


(Gnudiff) #3

Added those files in a zip file (except for plugin JARs, which I listed, but removed from zip):

  Length      Date    Time    Name
---------  ---------- -----   ----
    27545  2018-06-18 10:51   etc/graylog/server/server.conf
     2159  2017-03-02 15:31   etc/graylog/server/log4j2.xml
        0  2018-05-22 11:11   usr/share/graylog-server/plugin/
   499209  2017-04-04 15:36   usr/share/graylog-server/plugin/graylog-plugin-anonymous-usage-statistics-2.2.3.jar
    27030  2017-04-04 15:36   usr/share/graylog-server/plugin/graylog-plugin-beats-2.2.3.jar
  2936453  2017-04-04 15:36   usr/share/graylog-server/plugin/graylog-plugin-collector-2.2.3.jar
  4133067  2017-04-04 15:36   usr/share/graylog-server/plugin/graylog-plugin-enterprise-integration-2.2.3.jar
  6497687  2017-04-04 15:36   usr/share/graylog-server/plugin/graylog-plugin-map-widget-2.2.3.jar
  5582963  2017-04-04 15:36   usr/share/graylog-server/plugin/graylog-plugin-pipeline-processor-2.2.3.jar
  3493253  2018-05-22 11:11   usr/share/graylog-server/plugin/graylog-plugin-quickvaluesplus-widget-3.1.0.jar
      552  2017-03-02 15:31   etc/default/graylog-server
        7  2018-06-20 15:26   var/lib/graylog-server/journal/graylog2-committed-read-offset
        0  2018-06-20 14:37   var/lib/graylog-server/journal/messagejournal-0/
       29  2018-06-20 15:25   var/lib/graylog-server/journal/recovery-point-offset-checkpoint
  1838920  2018-06-20 13:25   var/log/graylog-server/server.log
     2491  2017-01-03 13:51   etc/default/elasticsearch
     3179  2017-03-31 15:08   etc/elasticsearch/elasticsearch.yml
     2571  2017-01-03 13:51   etc/elasticsearch/logging.yml
        0  2017-01-03 13:51   etc/elasticsearch/scripts/
        0  2017-03-31 15:08   var/log/elasticsearch/graylog_deprecation.log
        0  2017-03-31 15:08   var/log/elasticsearch/graylog_index_indexing_slowlog.log
        0  2017-03-31 15:08   var/log/elasticsearch/graylog_index_search_slowlog.log
     2996  2018-06-19 12:34   var/log/elasticsearch/graylog.log
      364  2018-06-10 21:03   var/log/elasticsearch/graylog.log.2018-06-10
      679  2018-06-11 21:54   var/log/elasticsearch/graylog.log.2018-06-11
      682  2018-06-12 22:15   var/log/elasticsearch/graylog.log.2018-06-12
      687  2018-06-13 23:05   var/log/elasticsearch/graylog.log.2018-06-13
      686  2018-06-14 22:41   var/log/elasticsearch/graylog.log.2018-06-14
     1007  2018-06-15 19:33   var/log/elasticsearch/graylog.log.2018-06-15
      324  2018-06-16 09:10   var/log/elasticsearch/graylog.log.2018-06-16
     3819  2018-06-17 15:00   var/log/elasticsearch/graylog.log.2018-06-17
     2078  2018-06-18 18:28   var/log/elasticsearch/graylog.log.2018-06-18
     2143  2014-06-20 12:48   etc/mongodb.conf
   246704  2018-06-20 15:27   var/log/mongodb/mongodb.log
   513145  2018-06-17 07:35   var/log/mongodb/mongodb.log.1
---------                     -------
 25822429                     35 files

(Jochen) #4

There are GC pauses in your Elasticsearch process, which take quite long:

$ grep -h monitor.jvm ./var/log/elasticsearch/graylog.log*
[2018-06-11 14:00:05,578][WARN ][monitor.jvm              ] [main] [gc][young][37475747][342316] duration [1s], collections [1]/[1.1s], total [1s]/[1.9h], memory [311.6mb]->[255.3mb]/[1015.6mb], all_pools {[young] [58.2mb]->[12.8kb]/[66.5mb]}{[survivor] [5mb]->[3.7mb]/[8.3mb]}{[old] [248.2mb]->[251.5mb]/[940.8mb]}
[2018-06-12 10:31:41,054][WARN ][monitor.jvm              ] [main] [gc][young][37549603][347854] duration [1.1s], collections [1]/[1.5s], total [1.1s]/[1.9h], memory [359.8mb]->[299.7mb]/[1015.6mb], all_pools {[young] [59.7mb]->[24.4kb]/[66.5mb]}{[survivor] [2mb]->[1.3mb]/[8.3mb]}{[old] [298mb]->[298.4mb]/[940.8mb]}
[2018-06-13 03:06:47,064][WARN ][monitor.jvm              ] [main] [gc][young][37609247][352326] duration [6.1s], collections [1]/[6.3s], total [6.1s]/[1.9h], memory [352.8mb]->[286.4mb]/[1015.6mb], all_pools {[young] [65.7mb]->[4.6kb]/[66.5mb]}{[survivor] [1.7mb]->[578.4kb]/[8.3mb]}{[old] [285.3mb]->[285.8mb]/[940.8mb]}
[2018-06-14 00:45:04,318][INFO ][monitor.jvm              ] [main] [gc][young][37687075][358216] duration [858ms], collections [1]/[1.3s], total [858ms]/[2h], memory [206.7mb]->[192.4mb]/[1015.6mb], all_pools {[young] [19.3mb]->[521.1kb]/[66.5mb]}{[survivor] [1.1mb]->[5.7mb]/[8.3mb]}{[old] [186.2mb]->[186.2mb]/[940.8mb]}
[2018-06-15 01:09:48,270][INFO ][monitor.jvm              ] [main] [gc][young][37774894][364902] duration [827ms], collections [1]/[1.5s], total [827ms]/[2h], memory [247.3mb]->[249.6mb]/[1015.6mb], all_pools {[young] [1.7mb]->[36.1kb]/[66.5mb]}{[survivor] [3.6mb]->[7.4mb]/[8.3mb]}{[old] [241.9mb]->[242.1mb]/[940.8mb]}
[2018-06-15 09:52:00,749][INFO ][monitor.jvm              ] [main] [gc][young][37806208][367084] duration [857ms], collections [1]/[1.8s], total [857ms]/[2h], memory [351.3mb]->[309mb]/[1015.6mb], all_pools {[young] [42.8mb]->[49.6kb]/[66.5mb]}{[survivor] [1.3mb]->[1001.5kb]/[8.3mb]}{[old] [307.1mb]->[307.9mb]/[940.8mb]}
[2018-06-16 09:10:21,219][WARN ][monitor.jvm              ] [main] [gc][young][37890043][372614] duration [1.2s], collections [1]/[2.2s], total [1.2s]/[2.1h], memory [285.2mb]->[234.3mb]/[1015.6mb], all_pools {[young] [50.9mb]->[54.9kb]/[66.5mb]}{[survivor] [458.3kb]->[452kb]/[8.3mb]}{[old] [233.8mb]->[233.8mb]/[940.8mb]}
[2018-06-17 00:05:10,262][WARN ][monitor.jvm              ] [main] [gc][young][37943716][375847] duration [1.5s], collections [1]/[2.1s], total [1.5s]/[2.1h], memory [265.5mb]->[246.2mb]/[1015.6mb], all_pools {[young] [19.9mb]->[4.1kb]/[66.5mb]}{[survivor] [1mb]->[1.7mb]/[8.3mb]}{[old] [244.5mb]->[244.5mb]/[940.8mb]}
[2018-06-17 00:07:01,507][WARN ][monitor.jvm              ] [main] [gc][young][37943825][375854] duration [2.4s], collections [1]/[2.9s], total [2.4s]/[2.1h], memory [308mb]->[246.3mb]/[1015.6mb], all_pools {[young] [61.9mb]->[4.4kb]/[66.5mb]}{[survivor] [1.3mb]->[900.1kb]/[8.3mb]}{[old] [244.7mb]->[245.4mb]/[940.8mb]}
[2018-06-17 00:09:11,339][WARN ][monitor.jvm              ] [main] [gc][young][37943954][375863] duration [1s], collections [1]/[1.5s], total [1s]/[2.1h], memory [300.5mb]->[246.8mb]/[1015.6mb], all_pools {[young] [54mb]->[41.7kb]/[66.5mb]}{[survivor] [1013kb]->[758.5kb]/[8.3mb]}{[old] [245.4mb]->[246mb]/[940.8mb]}
[2018-06-17 00:19:08,342][WARN ][monitor.jvm              ] [main] [gc][young][37944546][375900] duration [4.8s], collections [1]/[5.7s], total [4.8s]/[2.1h], memory [300.3mb]->[251.1mb]/[1015.6mb], all_pools {[young] [53.4mb]->[44.8kb]/[66.5mb]}{[survivor] [422.7kb]->[4.6mb]/[8.3mb]}{[old] [246.4mb]->[246.4mb]/[940.8mb]}
[2018-06-17 00:20:42,472][WARN ][monitor.jvm              ] [main] [gc][young][37944638][375907] duration [2.2s], collections [1]/[2.7s], total [2.2s]/[2.1h], memory [294mb]->[247.8mb]/[1015.6mb], all_pools {[young] [45.7mb]->[14kb]/[66.5mb]}{[survivor] [1.3mb]->[447.9kb]/[8.3mb]}{[old] [246.9mb]->[247.3mb]/[940.8mb]}
[2018-06-17 00:25:51,386][INFO ][monitor.jvm              ] [main] [gc][young][37944946][375926] duration [849ms], collections [1]/[1.6s], total [849ms]/[2.1h], memory [285.3mb]->[248.1mb]/[1015.6mb], all_pools {[young] [36.7mb]->[4.1kb]/[66.5mb]}{[survivor] [1mb]->[356kb]/[8.3mb]}{[old] [247.4mb]->[247.8mb]/[940.8mb]}
[2018-06-17 04:35:38,824][WARN ][monitor.jvm              ] [main] [gc][young][37959927][376840] duration [1.6s], collections [1]/[2.1s], total [1.6s]/[2.1h], memory [322.5mb]->[256.7mb]/[1015.6mb], all_pools {[young] [66mb]->[2.2kb]/[66.5mb]}{[survivor] [1.2mb]->[984.7kb]/[8.3mb]}{[old] [255.2mb]->[255.7mb]/[940.8mb]}
[2018-06-18 12:32:52,302][INFO ][monitor.jvm              ] [main] [gc][young][38074928][383402] duration [816ms], collections [1]/[1s], total [816ms]/[2.1h], memory [356.8mb]->[302.1mb]/[1015.6mb], all_pools {[young] [63.2mb]->[6mb]/[66.5mb]}{[survivor] [1.6mb]->[4.1mb]/[8.3mb]}{[old] [291.8mb]->[291.8mb]/[940.8mb]}
[2018-06-18 15:03:31,970][WARN ][monitor.jvm              ] [main] [gc][young][38083963][383591] duration [1s], collections [1]/[1s], total [1s]/[2.1h], memory [364.7mb]->[301.2mb]/[1015.6mb], all_pools {[young] [64.2mb]->[26.5kb]/[66.5mb]}{[survivor] [498.7kb]->[972.2kb]/[8.3mb]}{[old] [300mb]->[300.2mb]/[940.8mb]}
[2018-06-18 16:59:40,985][INFO ][monitor.jvm              ] [main] [gc][young][38090929][383733] duration [859ms], collections [1]/[1s], total [859ms]/[2.1h], memory [370.3mb]->[312.2mb]/[1015.6mb], all_pools {[young] [64.2mb]->[5.9mb]/[66.5mb]}{[survivor] [754.6kb]->[687.1kb]/[8.3mb]}{[old] [305.2mb]->[305.6mb]/[940.8mb]}
[2018-06-18 18:28:30,599][INFO ][monitor.jvm              ] [main] [gc][young][38096256][383842] duration [892ms], collections [1]/[1.4s], total [892ms]/[2.1h], memory [376.9mb]->[310.6mb]/[1015.6mb], all_pools {[young] [66.5mb]->[700.7kb]/[66.5mb]}{[survivor] [623.2kb]->[583.8kb]/[8.3mb]}{[old] [309.7mb]->[310.1mb]/[940.8mb]}

Additionally, it looks like you’re using the default JVM settings for Graylog and Elasticsearch, which might not be optimized for your environment.

What are the specifications of the machine running Graylog?
Are Graylog and Elasticsearch running on the same machine?


(Gnudiff) #5

Yes, elasticsearch and graylog run on the same machine and both were installed by the standard Ubuntu installer from distro packages. I haven’t made any tweaks to JVM settings.

The machine is Ubuntu 64bit virtual machine, running under VMWare 5.1 with 2GB RAM and consumed 2GHz 1 vCPU


(Jochen) #6

Well, 2 GB of memory is a bit tight for running everything on a single machine. Try again with at least 4 GB.

Better would be 6 GB, giving Graylog 1 GB of heap memory and Elasticsearch 2 GB of memory in the JVM settings, and using the remaining 3 GB as disk cache.

And last but not least, I’d recommend upgrading to the latest stable version of Graylog and Elasticsearch, being Graylog 2.4.5 and Elasticsearch 5.6.10 at the time of writing.


(Gnudiff) #7

Well, I can try that for sure; I am just confused that there are no relevant errors displayed that anything is amiss. Graylog just silently keeps not accepting any messages on the input and doesn’t give any indication anything is wrong.

Actually, as I was writing this answer, I browsed to details for my Graylog server node… and it says:

Disk journal utilization 0%, but 30,008 unprocessed messages are currently in the journal, in 13 segments.

Means those messages are stuck somewhere, but where and how to restart their processing??


(Jochen) #8

Try restarting Graylog.


(Gnudiff) #9

Restarted the machine. While it was down, I added RAM to 4GB.

After startup it processed some 20k messages and is now again pausing within Process buffer (10K messages there and around 10K in Disk Journal utilization).

The process buffer thread dump says most threads are waiiting:


(system) #10

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.