Graylog stops processing/writing messages

After some time of good working everything stops. Messages are being stored in journal and not being send to ES. There is no ES logs etc. What should I do to gather additional information. Most data below:

graylog logs:

2017-10-10T09:35:18.470+02:00 INFO  [Indices] Successfully created index template graylog-internal
2017-10-10T09:35:18.605+02:00 INFO  [MongoIndexSet] Waiting for allocation of index <graylog_201>.
2017-10-10T09:35:18.778+02:00 INFO  [MongoIndexSet] Index <graylog_201> has been successfully allocated.
2017-10-10T09:35:18.778+02:00 INFO  [MongoIndexSet] Pointing index alias <graylog_deflector> to new index <graylog_201>.
2017-10-10T09:35:19.225+02:00 INFO  [SystemJobManager] Submitted SystemJob <90f7ea90-ad8d-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob]
2017-10-10T09:35:19.225+02:00 INFO  [MongoIndexSet] Successfully pointed index alias <graylog_deflector> to index <graylog_201>.
2017-10-10T09:35:28.558+02:00 INFO  [AbstractIndexCountBasedRetentionStrategy] Number of indices (131) higher than limit (130). Running retention for 1 indices.
2017-10-10T09:35:28.681+02:00 INFO  [AbstractIndexCountBasedRetentionStrategy] Running retention strategy [org.graylog2.indexer.retention.strategies.DeletionRetentionStrategy] for index <graylog_71>
2017-10-10T09:35:28.871+02:00 INFO  [DeletionRetentionStrategy] Finished index retention strategy [delete] for index <graylog_71> in 189ms.
2017-10-10T09:35:49.235+02:00 INFO  [SetIndexReadOnlyJob] Flushing old index <graylog_200>.
2017-10-10T09:35:49.651+02:00 INFO  [SetIndexReadOnlyJob] Setting old index <graylog_200> to read-only.
2017-10-10T09:35:49.773+02:00 INFO  [SystemJobManager] Submitted SystemJob <a32d2bd0-ad8d-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob]
2017-10-10T09:35:49.778+02:00 INFO  [CreateNewSingleIndexRangeJob] Calculating ranges for index graylog_200.
2017-10-10T09:35:50.025+02:00 INFO  [OptimizeIndexJob] Optimizing index <graylog_200>.
2017-10-10T09:35:52.631+02:00 INFO  [MongoIndexRangeService] Calculated range of [graylog_200] in [2851ms].
2017-10-10T09:35:52.635+02:00 INFO  [CreateNewSingleIndexRangeJob] Created ranges for index graylog_200.
2017-10-10T09:35:52.635+02:00 INFO  [SystemJobManager] SystemJob <90f7ea90-ad8d-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob] finished in 3409ms.
2017-10-10T09:39:55.556+02:00 INFO  [SystemJobManager] SystemJob <a32d2bd0-ad8d-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob] finished in 245723ms.
2017-10-10T10:29:48.445+02:00 INFO  [AbstractRotationStrategy] Deflector index <Default index set> (index set <graylog_201>) should be rotated, Pointing deflector to new index now!
2017-10-10T10:29:48.449+02:00 INFO  [MongoIndexSet] Cycling from <graylog_201> to <graylog_202>.
2017-10-10T10:29:48.449+02:00 INFO  [MongoIndexSet] Creating target index <graylog_202>.
2017-10-10T10:29:48.481+02:00 INFO  [Indices] Successfully created index template graylog-internal
2017-10-10T10:29:48.619+02:00 INFO  [MongoIndexSet] Waiting for allocation of index <graylog_202>.
2017-10-10T10:29:48.781+02:00 INFO  [MongoIndexSet] Index <graylog_202> has been successfully allocated.
2017-10-10T10:29:48.781+02:00 INFO  [MongoIndexSet] Pointing index alias <graylog_deflector> to new index <graylog_202>.
2017-10-10T10:29:48.813+02:00 INFO  [SystemJobManager] Submitted SystemJob <2dcb73d0-ad95-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob]
2017-10-10T10:29:48.813+02:00 INFO  [MongoIndexSet] Successfully pointed index alias <graylog_deflector> to index <graylog_202>.
2017-10-10T10:30:18.821+02:00 INFO  [SetIndexReadOnlyJob] Flushing old index <graylog_201>.
2017-10-10T10:30:19.186+02:00 INFO  [SetIndexReadOnlyJob] Setting old index <graylog_201> to read-only.
2017-10-10T10:30:19.296+02:00 INFO  [SystemJobManager] Submitted SystemJob <3ff6ca16-ad95-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob]
2017-10-10T10:30:19.302+02:00 INFO  [CreateNewSingleIndexRangeJob] Calculating ranges for index graylog_201.
2017-10-10T10:30:19.303+02:00 INFO  [OptimizeIndexJob] Optimizing index <graylog_201>.
2017-10-10T10:30:20.935+02:00 INFO  [MongoIndexRangeService] Calculated range of [graylog_201] in [1623ms].
2017-10-10T10:30:20.953+02:00 INFO  [CreateNewSingleIndexRangeJob] Created ranges for index graylog_201.
2017-10-10T10:30:20.954+02:00 INFO  [SystemJobManager] SystemJob <2dcb73d0-ad95-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob] finished in 2140ms.
2017-10-10T10:30:31.404+02:00 INFO  [AbstractIndexCountBasedRetentionStrategy] Number of indices (131) higher than limit (130). Running retention for 1 indices.
2017-10-10T10:30:36.257+02:00 INFO  [AbstractIndexCountBasedRetentionStrategy] Running retention strategy [org.graylog2.indexer.retention.strategies.DeletionRetentionStrategy] for index <graylog_72>
2017-10-10T10:30:36.765+02:00 INFO  [DeletionRetentionStrategy] Finished index retention strategy [delete] for index <graylog_72> in 506ms.
2017-10-10T10:34:12.513+02:00 INFO  [SystemJobManager] SystemJob <3ff6ca16-ad95-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob] finished in 233216ms.
2017-10-10T11:22:58.475+02:00 INFO  [AbstractRotationStrategy] Deflector index <Default index set> (index set <graylog_202>) should be rotated, Pointing deflector to new index now!
2017-10-10T11:22:58.477+02:00 INFO  [MongoIndexSet] Cycling from <graylog_202> to <graylog_203>.
2017-10-10T11:22:58.478+02:00 INFO  [MongoIndexSet] Creating target index <graylog_203>.
2017-10-10T11:22:58.506+02:00 INFO  [Indices] Successfully created index template graylog-internal
2017-10-10T11:22:58.659+02:00 INFO  [MongoIndexSet] Waiting for allocation of index <graylog_203>.
2017-10-10T11:22:58.812+02:00 INFO  [MongoIndexSet] Index <graylog_203> has been successfully allocated.
2017-10-10T11:22:58.812+02:00 INFO  [MongoIndexSet] Pointing index alias <graylog_deflector> to new index <graylog_203>.
2017-10-10T11:22:58.842+02:00 INFO  [SystemJobManager] Submitted SystemJob <9b33189a-ad9c-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob]
2017-10-10T11:22:58.842+02:00 INFO  [MongoIndexSet] Successfully pointed index alias <graylog_deflector> to index <graylog_203>.
2017-10-10T11:23:28.856+02:00 INFO  [SetIndexReadOnlyJob] Flushing old index <graylog_202>.
2017-10-10T11:23:29.052+02:00 INFO  [SetIndexReadOnlyJob] Setting old index <graylog_202> to read-only.
2017-10-10T11:23:29.170+02:00 INFO  [SystemJobManager] Submitted SystemJob <ad4652e0-ad9c-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob]
2017-10-10T11:23:29.174+02:00 INFO  [CreateNewSingleIndexRangeJob] Calculating ranges for index graylog_202.
2017-10-10T11:23:29.179+02:00 INFO  [OptimizeIndexJob] Optimizing index <graylog_202>.
2017-10-10T11:23:30.465+02:00 INFO  [MongoIndexRangeService] Calculated range of [graylog_202] in [1289ms].
2017-10-10T11:23:30.471+02:00 INFO  [CreateNewSingleIndexRangeJob] Created ranges for index graylog_202.
2017-10-10T11:23:30.471+02:00 INFO  [SystemJobManager] SystemJob <9b33189a-ad9c-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob] finished in 1623ms.
2017-10-10T11:25:29.830+02:00 INFO  [AbstractIndexCountBasedRetentionStrategy] Number of indices (131) higher than limit (130). Running retention for 1 indices.
2017-10-10T11:25:30.002+02:00 INFO  [AbstractIndexCountBasedRetentionStrategy] Running retention strategy [org.graylog2.indexer.retention.strategies.DeletionRetentionStrategy] for index <graylog_73>
2017-10-10T11:25:30.255+02:00 INFO  [DeletionRetentionStrategy] Finished index retention strategy [delete] for index <graylog_73> in 252ms.
2017-10-10T11:27:49.497+02:00 INFO  [SystemJobManager] SystemJob <ad4652e0-ad9c-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob] finished in 260321ms.

ES cluster health:

{
   "initializing_shards" : 0,
   "unassigned_shards" : 0,
   "active_shards" : 1140,
   "relocating_shards" : 0,
   "status" : "green",
   "delayed_unassigned_shards" : 0,
   "number_of_in_flight_fetch" : 0,
   "number_of_data_nodes" : 3,
   "active_shards_percent_as_number" : 100,
   "number_of_pending_tasks" : 0,
   "task_max_waiting_in_queue_millis" : 0,
   "timed_out" : false,
   "cluster_name" : "graylog",
   "active_primary_shards" : 830,
   "number_of_nodes" : 3
}

Graylog server:

Hostname:

graylogserver

Node ID:

5f31a2e0-320b-4a06-83e3-ebe3ba806102

Version:

2.3.1+9f2c6ef, codename Tegernseer

JVM:

PID 9207, Oracle Corporation 1.8.0_131 on Linux 3.10.0-514.26.2.el7.x86_64

This is what I see after raising log level on graylog:

2017-10-10T12:09:03.428+02:00 DEBUG [BatchedElasticSearchOutputFlushThread] Checking for outputs to flush ...
2017-10-10T12:09:03.428+02:00 DEBUG [BatchedElasticSearchOutputFlushThread] Flushing output <org.graylog2.outputs.BlockingBatchedESOutput@428938af>

Did you check the health of your elasticsearch servers and the stats of them?

I did check the ES status, it was green. No log produced. CPU consumption very low and so on. No memory problems.

We are running:

Installed Packages
Name : elasticsearch
Arch : noarch
Version : 2.4.5
Release : 1
Size : 29 M
Repo : installed
Summary : Distribution: RPM
License : © 2009
Description : Elasticsearch is a distributed RESTful search engine built for the cloud. Reference documentation can be found at https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html and the
: ‘Elasticsearch: The Definitive Guide’ book can be found at https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html

and:

Graylog 2.3.1+9f2c6ef on graylogserver (Oracle Corporation 1.8.0_131 on Linux 3.10.0-514.26.2.el7.x86_64)

Would 5.x work better?

Did you get a response here:

curl -XGET ELASTICSEARCH:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED

If yes, did the following help to resolve?

curl -XPUT 'ELASTICSEARCH:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": "all"
  }
}
'

Unfortunately the problem occurred again:

elastic1:~ # curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 41040 100 41040 0 0 230k 0 --:–:-- --:–:-- --:–:-- 231k
elastic1:~ #

elastic2:~ # curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 41040 100 41040 0 0 247k 0 --:–:-- --:–:-- --:–:-- 248k
elastic2:~ #

elastic3:/ # curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 41040 100 41040 0 0 551k 0 --:–:-- --:–:-- --:–:-- 556k
elastic3:/ #

i launched your curl anyway:

{
“acknowledged” : true,
“persistent” : {
“cluster” : {
“routing” : {
“allocation” : {
“enable” : “all”
}
}
}
},
“transient” : { }
}

nothing happened

node status:

elastic1:~ # curl -XGET localhost:9200/_cluster/health?pretty
{
“cluster_name” : “graylog”,
“status” : “green”,
“timed_out” : false,
“number_of_nodes” : 3,
“number_of_data_nodes” : 3,
“active_primary_shards” : 830,
“active_shards” : 1140,
“relocating_shards” : 0,
“initializing_shards” : 0,
“unassigned_shards” : 0,
“delayed_unassigned_shards” : 0,
“number_of_pending_tasks” : 0,
“number_of_in_flight_fetch” : 0,
“task_max_waiting_in_queue_millis” : 0,
“active_shards_percent_as_number” : 100.0
}

elastic3:/ # curl -XGET ‘http://localhost:9200/_nodes
{“cluster_name”:“graylog”,“nodes”:{“feqipU-sQhKvhEf2C6bRPA”:{“name”:“elastic3”,“transport_address”:“10.151.70.72:9300”,“host”:“elastic3”,“ip”:“10.151.70.72”,“version”:“2.4.5”,“build”:“c849dd1”,“http_address”:“10.151.70.72:9200”,“attributes”:{“master”:“true”},“settings”:{“pidfile”:“/var/run/elasticsearch/elasticsearch.pid”,“cluster”:{“name”:“graylog”},“node”:{“data”:“true”,“name”:“elastic3”,“master”:“true”},“path”:{“conf”:“/etc/elasticsearch”,“data”:“/data/elasticsearch/data”,“logs”:“/data/elasticsearch/logs”,“work”:“/data/elasticsearch/work”,“home”:“/usr/share/elasticsearch”},“discovery”:{“zen”:{“minimum_master_nodes”:“2”,“ping”:{“multicast”:{“enabled”:“false”},“unicast”:{“hosts”:[“elastic1”,“elastic2”,“elastic3”]}}}},“name”:“elastic3”,“client”:{“type”:“node”},“http”:{“port”:“9200”},“bootstrap”:{“mlockall”:“true”},“transport”:{“tcp”:{“port”:“9300”}},“config”:{“ignore_system_properties”:“true”},“network”:{“host”:[“elastic3”,“127.0.0.1”]}},“os”:{“refresh_interval_in_millis”:1000,“name”:“Linux”,“arch”:“amd64”,“version”:“3.10.0-514.26.2.el7.x86_64”,“available_processors”:8,“allocated_processors”:8},“process”:{“refresh_interval_in_millis”:1000,“id”:47929,“mlockall”:true},“jvm”:{“pid”:47929,“version”:“1.8.0_141”,“vm_name”:“OpenJDK 64-Bit Server VM”,“vm_version”:“25.141-b16”,“vm_vendor”:“Oracle Corporation”,“start_time_in_millis”:1507730386693,“mem”:{“heap_init_in_bytes”:5368709120,“heap_max_in_bytes”:5298978816,“non_heap_init_in_bytes”:2555904,“non_heap_max_in_bytes”:0,“direct_max_in_bytes”:5298978816},“gc_collectors”:[“ParNew”,“ConcurrentMarkSweep”],“memory_pools”:[“Code Cache”,“Metaspace”,“Compressed Class Space”,“Par Eden Space”,“Par Survivor Space”,“CMS Old Gen”],“using_compressed_ordinary_object_pointers”:“true”},“thread_pool”:{“force_merge”:{“type”:“fixed”,“min”:1,“max”:1,“queue_size”:-1},“percolate”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“fetch_shard_started”:{“type”:“scaling”,“min”:1,“max”:16,“keep_alive”:“5m”,“queue_size”:-1},“listener”:{“type”:“fixed”,“min”:4,“max”:4,“queue_size”:-1},“index”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:200},“refresh”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“suggest”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“generic”:{“type”:“cached”,“keep_alive”:“30s”,“queue_size”:-1},“warmer”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“search”:{“type”:“fixed”,“min”:13,“max”:13,“queue_size”:1000},“flush”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“fetch_shard_store”:{“type”:“scaling”,“min”:1,“max”:16,“keep_alive”:“5m”,“queue_size”:-1},“management”:{“type”:“scaling”,“min”:1,“max”:5,“keep_alive”:“5m”,“queue_size”:-1},“get”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“bulk”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:50},“snapshot”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1}},“transport”:{“bound_address”:[“127.0.0.1:9300”,“10.151.70.72:9300”],“publish_address”:“10.151.70.72:9300”,“profiles”:{}},“http”:{“bound_address”:[“127.0.0.1:9200”,“10.151.70.72:9200”],“publish_address”:“10.151.70.72:9200”,“max_content_length_in_bytes”:104857600},“plugins”:,“modules”:[{“name”:“lang-expression”,“version”:“2.4.5”,“description”:“Lucene expressions integration for Elasticsearch”,“jvm”:true,“classname”:“org.elasticsearch.script.expression.ExpressionPlugin”,“isolated”:true,“site”:false},{“name”:“lang-groovy”,“version”:“2.4.5”,“description”:“Groovy scripting integration for Elasticsearch”,“jvm”:true,“classname”:“org.elasticsearch.script.groovy.GroovyPlugin”,“isolated”:true,“site”:false},{“name”:“reindex”,“version”:“2.4.5”,“description”:“_reindex and _update_by_query APIs”,“jvm”:true,“classname”:“org.elasticsearch.index.reindex.ReindexPlugin”,“isolated”:true,“site”:false}]},“Q1IlMWm6QFW9ID6zc__y6w”:{“name”:“elastic1”,“transport_address”:“10.151.70.10:9300”,“host”:“elastic1”,“ip”:“10.151.70.10”,“version”:“2.4.5”,“build”:“c849dd1”,“http_address”:“10.151.70.10:9200”,“attributes”:{“master”:“true”},“settings”:{“pidfile”:“/var/run/elasticsearch/elasticsearch.pid”,“cluster”:{“name”:“graylog”},“node”:{“data”:“true”,“name”:“elastic1”,“master”:“true”},“path”:{“conf”:“/etc/elasticsearch”,“data”:“/data/elasticsearch/data”,“logs”:“/data/elasticsearch/logs”,“work”:“/data/elasticsearch/work”,“home”:“/usr/share/elasticsearch”},“discovery”:{“zen”:{“minimum_master_nodes”:“2”,“ping”:{“multicast”:{“enabled”:“false”},“unicast”:{“hosts”:[“elastic1”,“elastic2”,“elastic3”]}}}},“name”:“elastic1”,“client”:{“type”:“node”},“http”:{“port”:“9200”},“bootstrap”:{“mlockall”:“true”},“transport”:{“tcp”:{“port”:“9300”}},“config”:{“ignore_system_properties”:“true”},“network”:{“host”:[“elastic1”,“127.0.0.1”]}},“os”:{“refresh_interval_in_millis”:1000,“name”:“Linux”,“arch”:“amd64”,“version”:“3.10.0-514.26.2.el7.x86_64”,“available_processors”:8,“allocated_processors”:8},“process”:{“refresh_interval_in_millis”:1000,“id”:59983,“mlockall”:true},“jvm”:{“pid”:59983,“version”:“1.8.0_141”,“vm_name”:“OpenJDK 64-Bit Server VM”,“vm_version”:“25.141-b16”,“vm_vendor”:“Oracle Corporation”,“start_time_in_millis”:1507730382142,“mem”:{“heap_init_in_bytes”:5368709120,“heap_max_in_bytes”:5298978816,“non_heap_init_in_bytes”:2555904,“non_heap_max_in_bytes”:0,“direct_max_in_bytes”:5298978816},“gc_collectors”:[“ParNew”,“ConcurrentMarkSweep”],“memory_pools”:[“Code Cache”,“Metaspace”,“Compressed Class Space”,“Par Eden Space”,“Par Survivor Space”,“CMS Old Gen”],“using_compressed_ordinary_object_pointers”:“true”},“thread_pool”:{“force_merge”:{“type”:“fixed”,“min”:1,“max”:1,“queue_size”:-1},“percolate”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“fetch_shard_started”:{“type”:“scaling”,“min”:1,“max”:16,“keep_alive”:“5m”,“queue_size”:-1},“listener”:{“type”:“fixed”,“min”:4,“max”:4,“queue_size”:-1},“index”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:200},“refresh”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“suggest”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“generic”:{“type”:“cached”,“keep_alive”:“30s”,“queue_size”:-1},“warmer”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“search”:{“type”:“fixed”,“min”:13,“max”:13,“queue_size”:1000},“flush”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“fetch_shard_store”:{“type”:“scaling”,“min”:1,“max”:16,“keep_alive”:“5m”,“queue_size”:-1},“management”:{“type”:“scaling”,“min”:1,“max”:5,“keep_alive”:“5m”,“queue_size”:-1},“get”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“bulk”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:50},“snapshot”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1}},“transport”:{“bound_address”:[“127.0.0.1:9300”,“10.151.70.10:9300”],“publish_address”:“10.151.70.10:9300”,“profiles”:{}},“http”:{“bound_address”:[“127.0.0.1:9200”,“10.151.70.10:9200”],“publish_address”:“10.151.70.10:9200”,“max_content_length_in_bytes”:104857600},“plugins”:,“modules”:[{“name”:“lang-expression”,“version”:“2.4.5”,“description”:“Lucene expressions integration for Elasticsearch”,“jvm”:true,“classname”:“org.elasticsearch.script.expression.ExpressionPlugin”,“isolated”:true,“site”:false},{“name”:“lang-groovy”,“version”:“2.4.5”,“description”:“Groovy scripting integration for Elasticsearch”,“jvm”:true,“classname”:“org.elasticsearch.script.groovy.GroovyPlugin”,“isolated”:true,“site”:false},{“name”:“reindex”,“version”:“2.4.5”,“description”:“_reindex and _update_by_query APIs”,“jvm”:true,“classname”:“org.elasticsearch.index.reindex.ReindexPlugin”,“isolated”:true,“site”:false}]},“6lxrVLTwRKCcyva6uW5rSw”:{“name”:“elastic2”,“transport_address”:“10.151.70.22:9300”,“host”:“elastic2”,“ip”:“10.151.70.22”,“version”:“2.4.5”,“build”:“c849dd1”,“http_address”:“10.151.70.22:9200”,“attributes”:{“master”:“true”},“settings”:{“pidfile”:“/var/run/elasticsearch/elasticsearch.pid”,“cluster”:{“name”:“graylog”},“node”:{“data”:“true”,“name”:“elastic2”,“master”:“true”},“path”:{“conf”:“/etc/elasticsearch”,“data”:“/data/elasticsearch/data”,“logs”:“/data/elasticsearch/logs”,“work”:“/data/elasticsearch/work”,“home”:“/usr/share/elasticsearch”},“discovery”:{“zen”:{“minimum_master_nodes”:“2”,“ping”:{“multicast”:{“enabled”:“false”},“unicast”:{“hosts”:[“elastic1”,“elastic2”,“elastic3”]}}}},“name”:“elastic2”,“client”:{“type”:“node”},“http”:{“port”:“9200”},“bootstrap”:{“mlockall”:“true”},“transport”:{“tcp”:{“port”:“9300”}},“config”:{“ignore_system_properties”:“true”},“network”:{“host”:[“elastic2”,“127.0.0.1”]}},“os”:{“refresh_interval_in_millis”:1000,“name”:“Linux”,“arch”:“amd64”,“version”:“3.10.0-514.26.2.el7.x86_64”,“available_processors”:8,“allocated_processors”:8},“process”:{“refresh_interval_in_millis”:1000,“id”:26904,“mlockall”:true},“jvm”:{“pid”:26904,“version”:“1.8.0_141”,“vm_name”:“OpenJDK 64-Bit Server VM”,“vm_version”:“25.141-b16”,“vm_vendor”:“Oracle Corporation”,“start_time_in_millis”:1507730385375,“mem”:{“heap_init_in_bytes”:5368709120,“heap_max_in_bytes”:5298978816,“non_heap_init_in_bytes”:2555904,“non_heap_max_in_bytes”:0,“direct_max_in_bytes”:5298978816},“gc_collectors”:[“ParNew”,“ConcurrentMarkSweep”],“memory_pools”:[“Code Cache”,“Metaspace”,“Compressed Class Space”,“Par Eden Space”,“Par Survivor Space”,“CMS Old Gen”],“using_compressed_ordinary_object_pointers”:“true”},“thread_pool”:{“force_merge”:{“type”:“fixed”,“min”:1,“max”:1,“queue_size”:-1},“percolate”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“fetch_shard_started”:{“type”:“scaling”,“min”:1,“max”:16,“keep_alive”:“5m”,“queue_size”:-1},“listener”:{“type”:“fixed”,“min”:4,“max”:4,“queue_size”:-1},“index”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:200},“refresh”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“suggest”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“generic”:{“type”:“cached”,“keep_alive”:“30s”,“queue_size”:-1},“warmer”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“search”:{“type”:“fixed”,“min”:13,“max”:13,“queue_size”:1000},“flush”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“fetch_shard_store”:{“type”:“scaling”,“min”:1,“max”:16,“keep_alive”:“5m”,“queue_size”:-1},“management”:{“type”:“scaling”,“min”:1,“max”:5,“keep_alive”:“5m”,“queue_size”:-1},“get”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“bulk”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:50},“snapshot”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1}},“transport”:{“bound_address”:[“10.151.70.22:9300”,“127.0.0.1:9300”],“publish_address”:“10.151.70.22:9300”,“profiles”:{}},“http”:{“bound_address”:[“10.151.70.22:9200”,“127.0.0.1:9200”],“publish_address”:“10.151.70.22:9200”,“max_content_length_in_bytes”:104857600},“plugins”:,“modules”:[{“name”:“lang-expression”,“version”:“2.4.5”,“description”:“Lucene expressions integration for Elasticsearch”,“jvm”:true,“classname”:“org.elasticsearch.script.expression.ExpressionPlugin”,“isolated”:true,“site”:false},{“name”:“lang-groovy”,“version”:“2.4.5”,“description”:“Groovy scripting integration for Elasticsearch”,“jvm”:true,“classname”:“org.elasticsearch.script.groovy.GroovyPlugin”,“isolated”:true,“site”:false},{“name”:“reindex”,“version”:“2.4.5”,“description”:“_reindex and _update_by_query APIs”,“jvm”:true,“classname”:“org.elasticsearch.index.reindex.ReindexPlugin”,“isolated”:true,“site”:false}]}}}

Graylog debug:

2017-10-18T14:49:40.377+02:00 DEBUG [KafkaJournal] Wrote 1 messages to journal: 813 bytes (payload 771 bytes), log position 3332640874 to 3332640874
2017-10-18T14:49:40.377+02:00 DEBUG [JournallingMessageHandler] Processed batch, wrote 771 bytes, last journal offset: 3332640874, signalling reader.
2017-10-18T14:49:40.381+02:00 DEBUG [JournallingMessageHandler] End of batch, journalling 2 messages
2017-10-18T14:49:40.381+02:00 DEBUG [KafkaJournal] Trying to write ByteBufferMessageSet with size of 1888 bytes to journal
2017-10-18T14:49:40.381+02:00 DEBUG [KafkaJournal] Wrote 2 messages to journal: 1888 bytes (payload 1804 bytes), log position 3332640875 to 3332640876
2017-10-18T14:49:40.381+02:00 DEBUG [JournallingMessageHandler] Processed batch, wrote 1804 bytes, last journal offset: 3332640876, signalling reader.
2017-10-18T14:49:40.382+02:00 DEBUG [JournallingMessageHandler] End of batch, journalling 1 messages
2017-10-18T14:49:40.382+02:00 DEBUG [KafkaJournal] Trying to write ByteBufferMessageSet with size of 924 bytes to journal
2017-10-18T14:49:40.382+02:00 DEBUG [KafkaJournal] Wrote 1 messages to journal: 924 bytes (payload 882 bytes), log position 3332640877 to 3332640877
2017-10-18T14:49:40.382+02:00 DEBUG [JournallingMessageHandler] Processed batch, wrote 882 bytes, last journal offset: 3332640877, signalling reader.
2017-10-18T14:49:40.383+02:00 DEBUG [JournallingMessageHandler] End of batch, journalling 1 messages
2017-10-18T14:49:40.383+02:00 DEBUG [KafkaJournal] Trying to write ByteBufferMessageSet with size of 834 bytes to journal
2017-10-18T14:49:40.383+02:00 DEBUG [KafkaJournal] Wrote 1 messages to journal: 834 bytes (payload 792 bytes), log position 3332640878 to 3332640878
2017-10-18T14:49:40.383+02:00 DEBUG [JournallingMessageHandler] Processed batch, wrote 792 bytes, last journal offset: 3332640878, signalling reader.
2017-10-18T14:49:40.389+02:00 DEBUG [JournallingMessageHandler] End of batch, journalling 1 messages
2017-10-18T14:49:40.389+02:00 DEBUG [KafkaJournal] Trying to write ByteBufferMessageSet with size of 337 bytes to journal
2017-10-18T14:49:40.389+02:00 DEBUG [KafkaJournal] Wrote 1 messages to journal: 337 bytes (payload 295 bytes), log position 3332640879 to 3332640879
2017-10-18T14:49:40.389+02:00 DEBUG [JournallingMessageHandler] Processed batch, wrote 295 bytes, last journal offset: 3332640879, signalling reader.
2017-10-18T14:49:40.390+02:00 DEBUG [JournallingMessageHandler] End of batch, journalling 1 messages
2017-10-18T14:49:40.390+02:00 DEBUG [KafkaJournal] Trying to write ByteBufferMessageSet with size of 429 bytes to journal
2017-10-18T14:49:40.390+02:00 DEBUG [KafkaJournal] Wrote 1 messages to journal: 429 bytes (payload 387 bytes), log position 3332640880 to 3332640880
2017-10-18T14:49:40.390+02:00 DEBUG [JournallingMessageHandler] Processed batch, wrote 387 bytes, last journal offset: 3332640880, signalling reader.
2017-10-18T14:49:40.390+02:00 DEBUG [JournallingMessageHandler] End of batch, journalling 3 messages
2017-10-18T14:49:40.390+02:00 DEBUG [KafkaJournal] Trying to write ByteBufferMessageSet with size of 1355 bytes to journal
2017-10-18T14:49:40.390+02:00 DEBUG [KafkaJournal] Wrote 3 messages to journal: 1355 bytes (payload 1229 bytes), log position 3332640881 to 3332640883
2017-10-18T14:49:40.390+02:00 DEBUG [JournallingMessageHandler] Processed batch, wrote 1229 bytes, last journal offset: 3332640883, signalling reader.
2017-10-18T14:49:40.392+02:00 DEBUG [JournallingMessageHandler] End of batch, journalling 1 messages
2017-10-18T14:49:40.392+02:00 DEBUG [KafkaJournal] Trying to write ByteBufferMessageSet with size of 825 bytes to journal

Restart of graylog server solved the problem for now.

Is your mongodb healthy when processing stops?

I do nothing with mongodb, logs from mongodb:

2017-10-18T12:43:24.331+0200 I COMMAND [conn246] command graylog.$cmd command: createIndexes { createIndexes: “notifications”, indexes: [ { key: { type: 1 }, name: “type_1”, ns: “graylog.notifications” } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:149 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 141245 } } } protocol:op_query 267ms
2017-10-18T12:43:24.332+0200 I COMMAND [conn241] command graylog.$cmd command: createIndexes { createIndexes: “users”, indexes: [ { key: { roles: 1 }, name: “roles_1”, ns: “graylog.users” } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:149 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 276684 } } } protocol:op_query 278ms
2017-10-18T12:43:24.330+0200 I COMMAND [conn248] command graylog.$cmd command: createIndexes { createIndexes: “notifications”, indexes: [ { key: { type: 1 }, name: “type_1”, ns: “graylog.notifications” } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:149 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 290658 } } } protocol:op_query 304ms
2017-10-18T12:43:24.330+0200 I COMMAND [conn253] command graylog.$cmd command: createIndexes { createIndexes: “users”, indexes: [ { key: { roles: 1 }, name: “roles_1”, ns: “graylog.users” } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:149 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 278768 } } } protocol:op_query 281ms
2017-10-18T12:43:24.343+0200 I COMMAND [conn237] command graylog.$cmd command: createIndexes { createIndexes: “notifications”, indexes: [ { key: { type: 1 }, name: “type_1”, ns: “graylog.notifications” } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:149 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 150163 } } } protocol:op_query 274ms
2017-10-18T12:43:24.349+0200 I COMMAND [conn247] command graylog.roles command: find { find: “roles”, filter: {} } planSummary: COLLSCAN keysExamined:0 docsExamined:8 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:8 reslen:4838 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 120051 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 120ms
2017-10-18T12:43:24.360+0200 I WRITE [conn249] update graylog.nodes query: { _id: ObjectId(‘59d57202eac6dd2bae6b5300’) } update: { is_master: true, hostname: “graylogserver”, last_seen: 1508323404, transport_address: “http://10.151.70.23:12900/api/”, _id: ObjectId(‘59d57202eac6dd2bae6b5300’), type: “SERVER”, node_id: “5f31a2e0-320b-4a06-83e3-ebe3ba806102” } keysExamined:1 docsExamined:1 nMatched:1 nModified:1 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 103949 } }, Collection: { acquireCount: { w: 1 } } } 108ms
2017-10-18T12:43:24.360+0200 I COMMAND [conn249] command graylog.$cmd command: update { update: “nodes”, ordered: true, updates: [ { q: { _id: ObjectId(‘59d57202eac6dd2bae6b5300’) }, u: { is_master: true, hostname: “graylogserver”, last_seen: 1508323404, transport_address: “http://10.151.70.23:12900/api/”, _id: ObjectId(‘59d57202eac6dd2bae6b5300’), type: “SERVER”, node_id: “5f31a2e0-320b-4a06-83e3-ebe3ba806102” }, upsert: true } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 103949 } }, Collection: { acquireCount: { w: 1 } } } protocol:op_query 110ms
2017-10-18T14:37:32.514+0200 I COMMAND [conn238] command graylog.index_ranges command: find { find: “index_ranges”, filter: { $or: [ { $and: [ { start: { $exists: false } }, { begin: { $lte: 1508330252374 } }, { end: { $gte: 1508329952374 } } ] }, { $and: [ { start: { $exists: false } }, { begin: { $lte: 0 } }, { end: { $gte: 0 } } ] } ] } } planSummary: IXSCAN { begin: 1, end: 1 }, IXSCAN { begin: 1, end: 1 } keysExamined:132 docsExamined:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:2 nreturned:1 reslen:243 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 }, acquireWaitCount: { r: 2 }, timeAcquiringMicros: { r: 77 } }, Collection: { acquireCount: { r: 3 } } } protocol:op_query 127ms
2017-10-18T15:37:09.933+0200 I NETWORK [conn249] end connection 127.0.0.1:45738 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn244] end connection 127.0.0.1:53692 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn246] end connection 127.0.0.1:45732 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn239] end connection 127.0.0.1:53682 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn247] end connection 127.0.0.1:45734 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn250] end connection 127.0.0.1:45740 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn240] end connection 127.0.0.1:53684 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn242] end connection 127.0.0.1:53688 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn238] end connection 127.0.0.1:53680 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn251] end connection 127.0.0.1:45742 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn248] end connection 127.0.0.1:45736 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn237] end connection 127.0.0.1:53678 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn253] end connection 127.0.0.1:45746 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn241] end connection 127.0.0.1:53686 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn252] end connection 127.0.0.1:45744 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn243] end connection 127.0.0.1:53690 (17 connections now open)
2017-10-18T15:37:09.938+0200 I NETWORK [conn235] end connection 127.0.0.1:53670 (17 connections now open)
2017-10-18T15:37:09.942+0200 I NETWORK [conn236] end connection 127.0.0.1:53672 (17 connections now open)
2017-10-18T15:37:17.885+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:53160 #260 (1 connection now open)
2017-10-18T15:37:17.936+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:53162 #261 (2 connections now open)
2017-10-18T15:37:20.492+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:53166 #262 (3 connections now open)
2017-10-18T15:37:20.997+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:53168 #263 (4 connections now open)
2017-10-18T15:37:20.998+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:53170 #264 (5 connections now open)
2017-10-18T15:37:21.002+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:53172 #265 (6 connections now open)
2017-10-18T15:37:21.002+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:53173 #266 (7 connections now open)

Same problem:

nothing in graylog logs

any ideas?

[2017-10-25 15:10:56,194][DEBUG][index.engine ] [elastic1] [graylog_337][1] merge segment [_2x3] done: took [2.8m], [835.1 MB], [566,251 docs], [0s stopped], [2.5m throttled], [815.7 MB written], [5.0 MB/sec throttle]
[2017-10-25 15:11:21,823][DEBUG][indices.memory ] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
[2017-10-25 15:11:48,691][DEBUG][index.engine ] [elastic1] [graylog_337][1] merge segment [_32m] done: took [37.4s], [96.1 MB], [61,641 docs], [0s stopped], [35.2s throttled], [182.9 MB written], [5.0 MB/sec throttle]
[2017-10-25 15:11:51,823][DEBUG][indices.memory ] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
[2017-10-25 15:12:21,824][DEBUG][indices.memory ] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
[2017-10-25 15:12:51,824][DEBUG][indices.memory ] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
[2017-10-25 15:13:14,449][DEBUG][index.engine ] [elastic1] [graylog_337][1] merge segment [_35e] done: took [35.5s], [91.3 MB], [57,666 docs], [0s stopped], [33.4s throttled], [173.9 MB written], [5.0 MB/sec throttle]
[2017-10-25 15:13:21,825][DEBUG][indices.memory ] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
[2017-10-25 15:13:51,825][DEBUG][indices.memory ] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
[2017-10-25 15:14:21,825][DEBUG][indices.memory ] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
[2017-10-25 15:14:51,826][DEBUG][indices.memory ] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
[2017-10-25 15:15:21,826][DEBUG][indices.memory ] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
[2017-10-25 15:15:51,827][DEBUG][indices.memory ] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
[2017-10-25 15:16:21,827][DEBUG][indices.memory ] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
[2017-10-25 15:16:51,828][DEBUG][indices.memory ] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
[2017-10-25 15:17:21,828][DEBUG][indices.memory ] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
[2017-10-25 15:17:51,829][DEBUG][indices.memory ] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
[2017-10-25 15:18:21,829][DEBUG][indices.memory ] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
[2017-10-25 15:18:51,830][DEBUG][index.shard ] [elastic1] [graylog_337][1] updating index_buffer_size from [505.3mb] to [500kb]; IndexWriter now using [0] bytes
[2017-10-25 15:18:51,830][DEBUG][index.shard ] [elastic1] [graylog_337][1] marking shard as inactive (inactive_time=[300000000000]) indexing wise
[2017-10-25 15:18:52,106][DEBUG][index.engine ] [elastic1] [graylog_337][1] successfully sync committed. sync id [AV9Tr4LEkONQILMAHudJ].

I have a thread dump but cant upload here

Ok, problem occurred again:

What additional info is needed to analyze that?

I have x-pack on my ES cluster, everything is green.

“processbufferprocessor-0” id=40 state=RUNNABLE
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3776)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4260)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3777)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$Start.match(Pattern.java:3461)
at java.util.regex.Matcher.search(Matcher.java:1248)
at java.util.regex.Matcher.find(Matcher.java:637)
at org.graylog.plugins.pipelineprocessor.functions.strings.RegexMatch.evaluate(RegexMatch.java:63)
at org.graylog.plugins.pipelineprocessor.functions.strings.RegexMatch.evaluate(RegexMatch.java:37)
at org.graylog.plugins.pipelineprocessor.ast.expressions.FunctionExpression.evaluateUnsafe(FunctionExpression.java:63)
at org.graylog.plugins.pipelineprocessor.ast.expressions.Expression.evaluate(Expression.java:41)
at org.graylog.plugins.pipelineprocessor.ast.statements.VarAssignStatement.evaluate(VarAssignStatement.java:33)
at org.graylog.plugins.pipelineprocessor.ast.statements.VarAssignStatement.evaluate(VarAssignStatement.java:22)
at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.evaluateStatement(PipelineInterpreter.java:377)
at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.executeRuleActions(PipelineInterpreter.java:364)
at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.evaluateStage(PipelineInterpreter.java:305)
at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.processForResolvedPipelines(PipelineInterpreter.java:263)
at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.process(PipelineInterpreter.java:143)
at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.process(PipelineInterpreter.java:99)
at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.handleMessage(ProcessBufferProcessor.java:114)
at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.dispatchMessage(ProcessBufferProcessor.java:100)
at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:77)
at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:42)
at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143)
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66)
at java.lang.Thread.run(Thread.java:748)

“processbufferprocessor-1” id=41 state=WAITING
- waiting on <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
- locked <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at com.lmax.disruptor.BlockingWaitStrategy.waitFor(BlockingWaitStrategy.java:45)
at com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:148)
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66)
at java.lang.Thread.run(Thread.java:748)

“processbufferprocessor-2” id=42 state=WAITING
- waiting on <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
- locked <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at com.lmax.disruptor.BlockingWaitStrategy.waitFor(BlockingWaitStrategy.java:45)
at com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:148)
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66)
at java.lang.Thread.run(Thread.java:748)

“processbufferprocessor-3” id=43 state=WAITING
- waiting on <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
- locked <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at com.lmax.disruptor.BlockingWaitStrategy.waitFor(BlockingWaitStrategy.java:45)
at com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:148)
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66)
at java.lang.Thread.run(Thread.java:748)

“processbufferprocessor-4” id=44 state=WAITING
- waiting on <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
- locked <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at com.lmax.disruptor.BlockingWaitStrategy.waitFor(BlockingWaitStrategy.java:45)
at com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:148)
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66)
at java.lang.Thread.run(Thread.java:748)

Looks like all output thread are waiting:

Looks like they are hung for some reason?

How can i debug this?

Restarting the server makes him output again

For reference:
https://github.com/Graylog2/graylog2-server/issues/4326

hi all, i have the same problem right now, so restart of one node helps.
unprocessed messages growed up, cluster was green, everything was ok , but messages didn`t get to the ES.

graylog ver 2.2.3

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.