Graylog stops processing/writing messages

madi · October 10, 2017, 10:24am

After some time of good working everything stops. Messages are being stored in journal and not being send to ES. There is no ES logs etc. What should I do to gather additional information. Most data below:

graylog logs:

2017-10-10T09:35:18.470+02:00 INFO  [Indices] Successfully created index template graylog-internal
2017-10-10T09:35:18.605+02:00 INFO  [MongoIndexSet] Waiting for allocation of index <graylog_201>.
2017-10-10T09:35:18.778+02:00 INFO  [MongoIndexSet] Index <graylog_201> has been successfully allocated.
2017-10-10T09:35:18.778+02:00 INFO  [MongoIndexSet] Pointing index alias <graylog_deflector> to new index <graylog_201>.
2017-10-10T09:35:19.225+02:00 INFO  [SystemJobManager] Submitted SystemJob <90f7ea90-ad8d-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob]
2017-10-10T09:35:19.225+02:00 INFO  [MongoIndexSet] Successfully pointed index alias <graylog_deflector> to index <graylog_201>.
2017-10-10T09:35:28.558+02:00 INFO  [AbstractIndexCountBasedRetentionStrategy] Number of indices (131) higher than limit (130). Running retention for 1 indices.
2017-10-10T09:35:28.681+02:00 INFO  [AbstractIndexCountBasedRetentionStrategy] Running retention strategy [org.graylog2.indexer.retention.strategies.DeletionRetentionStrategy] for index <graylog_71>
2017-10-10T09:35:28.871+02:00 INFO  [DeletionRetentionStrategy] Finished index retention strategy [delete] for index <graylog_71> in 189ms.
2017-10-10T09:35:49.235+02:00 INFO  [SetIndexReadOnlyJob] Flushing old index <graylog_200>.
2017-10-10T09:35:49.651+02:00 INFO  [SetIndexReadOnlyJob] Setting old index <graylog_200> to read-only.
2017-10-10T09:35:49.773+02:00 INFO  [SystemJobManager] Submitted SystemJob <a32d2bd0-ad8d-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob]
2017-10-10T09:35:49.778+02:00 INFO  [CreateNewSingleIndexRangeJob] Calculating ranges for index graylog_200.
2017-10-10T09:35:50.025+02:00 INFO  [OptimizeIndexJob] Optimizing index <graylog_200>.
2017-10-10T09:35:52.631+02:00 INFO  [MongoIndexRangeService] Calculated range of [graylog_200] in [2851ms].
2017-10-10T09:35:52.635+02:00 INFO  [CreateNewSingleIndexRangeJob] Created ranges for index graylog_200.
2017-10-10T09:35:52.635+02:00 INFO  [SystemJobManager] SystemJob <90f7ea90-ad8d-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob] finished in 3409ms.
2017-10-10T09:39:55.556+02:00 INFO  [SystemJobManager] SystemJob <a32d2bd0-ad8d-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob] finished in 245723ms.
2017-10-10T10:29:48.445+02:00 INFO  [AbstractRotationStrategy] Deflector index <Default index set> (index set <graylog_201>) should be rotated, Pointing deflector to new index now!
2017-10-10T10:29:48.449+02:00 INFO  [MongoIndexSet] Cycling from <graylog_201> to <graylog_202>.
2017-10-10T10:29:48.449+02:00 INFO  [MongoIndexSet] Creating target index <graylog_202>.
2017-10-10T10:29:48.481+02:00 INFO  [Indices] Successfully created index template graylog-internal
2017-10-10T10:29:48.619+02:00 INFO  [MongoIndexSet] Waiting for allocation of index <graylog_202>.
2017-10-10T10:29:48.781+02:00 INFO  [MongoIndexSet] Index <graylog_202> has been successfully allocated.
2017-10-10T10:29:48.781+02:00 INFO  [MongoIndexSet] Pointing index alias <graylog_deflector> to new index <graylog_202>.
2017-10-10T10:29:48.813+02:00 INFO  [SystemJobManager] Submitted SystemJob <2dcb73d0-ad95-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob]
2017-10-10T10:29:48.813+02:00 INFO  [MongoIndexSet] Successfully pointed index alias <graylog_deflector> to index <graylog_202>.
2017-10-10T10:30:18.821+02:00 INFO  [SetIndexReadOnlyJob] Flushing old index <graylog_201>.
2017-10-10T10:30:19.186+02:00 INFO  [SetIndexReadOnlyJob] Setting old index <graylog_201> to read-only.
2017-10-10T10:30:19.296+02:00 INFO  [SystemJobManager] Submitted SystemJob <3ff6ca16-ad95-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob]
2017-10-10T10:30:19.302+02:00 INFO  [CreateNewSingleIndexRangeJob] Calculating ranges for index graylog_201.
2017-10-10T10:30:19.303+02:00 INFO  [OptimizeIndexJob] Optimizing index <graylog_201>.
2017-10-10T10:30:20.935+02:00 INFO  [MongoIndexRangeService] Calculated range of [graylog_201] in [1623ms].
2017-10-10T10:30:20.953+02:00 INFO  [CreateNewSingleIndexRangeJob] Created ranges for index graylog_201.
2017-10-10T10:30:20.954+02:00 INFO  [SystemJobManager] SystemJob <2dcb73d0-ad95-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob] finished in 2140ms.
2017-10-10T10:30:31.404+02:00 INFO  [AbstractIndexCountBasedRetentionStrategy] Number of indices (131) higher than limit (130). Running retention for 1 indices.
2017-10-10T10:30:36.257+02:00 INFO  [AbstractIndexCountBasedRetentionStrategy] Running retention strategy [org.graylog2.indexer.retention.strategies.DeletionRetentionStrategy] for index <graylog_72>
2017-10-10T10:30:36.765+02:00 INFO  [DeletionRetentionStrategy] Finished index retention strategy [delete] for index <graylog_72> in 506ms.
2017-10-10T10:34:12.513+02:00 INFO  [SystemJobManager] SystemJob <3ff6ca16-ad95-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob] finished in 233216ms.
2017-10-10T11:22:58.475+02:00 INFO  [AbstractRotationStrategy] Deflector index <Default index set> (index set <graylog_202>) should be rotated, Pointing deflector to new index now!
2017-10-10T11:22:58.477+02:00 INFO  [MongoIndexSet] Cycling from <graylog_202> to <graylog_203>.
2017-10-10T11:22:58.478+02:00 INFO  [MongoIndexSet] Creating target index <graylog_203>.
2017-10-10T11:22:58.506+02:00 INFO  [Indices] Successfully created index template graylog-internal
2017-10-10T11:22:58.659+02:00 INFO  [MongoIndexSet] Waiting for allocation of index <graylog_203>.
2017-10-10T11:22:58.812+02:00 INFO  [MongoIndexSet] Index <graylog_203> has been successfully allocated.
2017-10-10T11:22:58.812+02:00 INFO  [MongoIndexSet] Pointing index alias <graylog_deflector> to new index <graylog_203>.
2017-10-10T11:22:58.842+02:00 INFO  [SystemJobManager] Submitted SystemJob <9b33189a-ad9c-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob]
2017-10-10T11:22:58.842+02:00 INFO  [MongoIndexSet] Successfully pointed index alias <graylog_deflector> to index <graylog_203>.
2017-10-10T11:23:28.856+02:00 INFO  [SetIndexReadOnlyJob] Flushing old index <graylog_202>.
2017-10-10T11:23:29.052+02:00 INFO  [SetIndexReadOnlyJob] Setting old index <graylog_202> to read-only.
2017-10-10T11:23:29.170+02:00 INFO  [SystemJobManager] Submitted SystemJob <ad4652e0-ad9c-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob]
2017-10-10T11:23:29.174+02:00 INFO  [CreateNewSingleIndexRangeJob] Calculating ranges for index graylog_202.
2017-10-10T11:23:29.179+02:00 INFO  [OptimizeIndexJob] Optimizing index <graylog_202>.
2017-10-10T11:23:30.465+02:00 INFO  [MongoIndexRangeService] Calculated range of [graylog_202] in [1289ms].
2017-10-10T11:23:30.471+02:00 INFO  [CreateNewSingleIndexRangeJob] Created ranges for index graylog_202.
2017-10-10T11:23:30.471+02:00 INFO  [SystemJobManager] SystemJob <9b33189a-ad9c-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob] finished in 1623ms.
2017-10-10T11:25:29.830+02:00 INFO  [AbstractIndexCountBasedRetentionStrategy] Number of indices (131) higher than limit (130). Running retention for 1 indices.
2017-10-10T11:25:30.002+02:00 INFO  [AbstractIndexCountBasedRetentionStrategy] Running retention strategy [org.graylog2.indexer.retention.strategies.DeletionRetentionStrategy] for index <graylog_73>
2017-10-10T11:25:30.255+02:00 INFO  [DeletionRetentionStrategy] Finished index retention strategy [delete] for index <graylog_73> in 252ms.
2017-10-10T11:27:49.497+02:00 INFO  [SystemJobManager] SystemJob <ad4652e0-ad9c-11e7-b67e-005056806fd3> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob] finished in 260321ms.

ES cluster health:

{
   "initializing_shards" : 0,
   "unassigned_shards" : 0,
   "active_shards" : 1140,
   "relocating_shards" : 0,
   "status" : "green",
   "delayed_unassigned_shards" : 0,
   "number_of_in_flight_fetch" : 0,
   "number_of_data_nodes" : 3,
   "active_shards_percent_as_number" : 100,
   "number_of_pending_tasks" : 0,
   "task_max_waiting_in_queue_millis" : 0,
   "timed_out" : false,
   "cluster_name" : "graylog",
   "active_primary_shards" : 830,
   "number_of_nodes" : 3
}

Graylog server:

Hostname:

graylogserver

Node ID:

5f31a2e0-320b-4a06-83e3-ebe3ba806102

Version:

2.3.1+9f2c6ef, codename Tegernseer

JVM:

PID 9207, Oracle Corporation 1.8.0_131 on Linux 3.10.0-514.26.2.el7.x86_64

This is what I see after raising log level on graylog:

2017-10-10T12:09:03.428+02:00 DEBUG [BatchedElasticSearchOutputFlushThread] Checking for outputs to flush ...
2017-10-10T12:09:03.428+02:00 DEBUG [BatchedElasticSearchOutputFlushThread] Flushing output <org.graylog2.outputs.BlockingBatchedESOutput@428938af>

jan · October 12, 2017, 9:48am

Did you check the health of your elasticsearch servers and the stats of them?

madi · October 12, 2017, 10:10am

I did check the ES status, it was green. No log produced. CPU consumption very low and so on. No memory problems.

We are running:

Installed Packages
Name : elasticsearch
Arch : noarch
Version : 2.4.5
Release : 1
Size : 29 M
Repo : installed
Summary : Distribution: RPM
License : © 2009
Description : Elasticsearch is a distributed RESTful search engine built for the cloud. Reference documentation can be found at https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html and the
: ‘Elasticsearch: The Definitive Guide’ book can be found at https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html

and:

Graylog 2.3.1+9f2c6ef on graylogserver (Oracle Corporation 1.8.0_131 on Linux 3.10.0-514.26.2.el7.x86_64)

Would 5.x work better?

jan · October 12, 2017, 3:35pm

Did you get a response here:

curl -XGET ELASTICSEARCH:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED

If yes, did the following help to resolve?

curl -XPUT 'ELASTICSEARCH:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": "all"
  }
}
'

madi · October 18, 2017, 12:45pm

Unfortunately the problem occurred again:

elastic1:~ # curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 41040 100 41040 0 0 230k 0 --:–:-- --:–:-- --:–:-- 231k
elastic1:~ #

elastic2:~ # curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 41040 100 41040 0 0 247k 0 --:–:-- --:–:-- --:–:-- 248k
elastic2:~ #

elastic3:/ # curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 41040 100 41040 0 0 551k 0 --:–:-- --:–:-- --:–:-- 556k
elastic3:/ #

i launched your curl anyway:

{
“acknowledged” : true,
“persistent” : {
“cluster” : {
“routing” : {
“allocation” : {
“enable” : “all”
}
}
}
},
“transient” : { }
}

nothing happened

node status:

elastic1:~ # curl -XGET localhost:9200/_cluster/health?pretty
{
“cluster_name” : “graylog”,
“status” : “green”,
“timed_out” : false,
“number_of_nodes” : 3,
“number_of_data_nodes” : 3,
“active_primary_shards” : 830,
“active_shards” : 1140,
“relocating_shards” : 0,
“initializing_shards” : 0,
“unassigned_shards” : 0,
“delayed_unassigned_shards” : 0,
“number_of_pending_tasks” : 0,
“number_of_in_flight_fetch” : 0,
“task_max_waiting_in_queue_millis” : 0,
“active_shards_percent_as_number” : 100.0
}

elastic3:/ # curl -XGET ‘http://localhost:9200/_nodes’
{“cluster_name”:“graylog”,“nodes”:{“feqipU-sQhKvhEf2C6bRPA”:{“name”:“elastic3”,“transport_address”:“10.151.70.72:9300”,“host”:“elastic3”,“ip”:“10.151.70.72”,“version”:“2.4.5”,“build”:“c849dd1”,“http_address”:“10.151.70.72:9200”,“attributes”:{“master”:“true”},“settings”:{“pidfile”:“/var/run/elasticsearch/elasticsearch.pid”,“cluster”:{“name”:“graylog”},“node”:{“data”:“true”,“name”:“elastic3”,“master”:“true”},“path”:{“conf”:“/etc/elasticsearch”,“data”:“/data/elasticsearch/data”,“logs”:“/data/elasticsearch/logs”,“work”:“/data/elasticsearch/work”,“home”:“/usr/share/elasticsearch”},“discovery”:{“zen”:{“minimum_master_nodes”:“2”,“ping”:{“multicast”:{“enabled”:“false”},“unicast”:{“hosts”:[“elastic1”,“elastic2”,“elastic3”]}}}},“name”:“elastic3”,“client”:{“type”:“node”},“http”:{“port”:“9200”},“bootstrap”:{“mlockall”:“true”},“transport”:{“tcp”:{“port”:“9300”}},“config”:{“ignore_system_properties”:“true”},“network”:{“host”:[“elastic3”,“127.0.0.1”]}},“os”:{“refresh_interval_in_millis”:1000,“name”:“Linux”,“arch”:“amd64”,“version”:“3.10.0-514.26.2.el7.x86_64”,“available_processors”:8,“allocated_processors”:8},“process”:{“refresh_interval_in_millis”:1000,“id”:47929,“mlockall”:true},“jvm”:{“pid”:47929,“version”:“1.8.0_141”,“vm_name”:“OpenJDK 64-Bit Server VM”,“vm_version”:“25.141-b16”,“vm_vendor”:“Oracle Corporation”,“start_time_in_millis”:1507730386693,“mem”:{“heap_init_in_bytes”:5368709120,“heap_max_in_bytes”:5298978816,“non_heap_init_in_bytes”:2555904,“non_heap_max_in_bytes”:0,“direct_max_in_bytes”:5298978816},“gc_collectors”:[“ParNew”,“ConcurrentMarkSweep”],“memory_pools”:[“Code Cache”,“Metaspace”,“Compressed Class Space”,“Par Eden Space”,“Par Survivor Space”,“CMS Old Gen”],“using_compressed_ordinary_object_pointers”:“true”},“thread_pool”:{“force_merge”:{“type”:“fixed”,“min”:1,“max”:1,“queue_size”:-1},“percolate”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“fetch_shard_started”:{“type”:“scaling”,“min”:1,“max”:16,“keep_alive”:“5m”,“queue_size”:-1},“listener”:{“type”:“fixed”,“min”:4,“max”:4,“queue_size”:-1},“index”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:200},“refresh”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“suggest”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“generic”:{“type”:“cached”,“keep_alive”:“30s”,“queue_size”:-1},“warmer”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“search”:{“type”:“fixed”,“min”:13,“max”:13,“queue_size”:1000},“flush”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“fetch_shard_store”:{“type”:“scaling”,“min”:1,“max”:16,“keep_alive”:“5m”,“queue_size”:-1},“management”:{“type”:“scaling”,“min”:1,“max”:5,“keep_alive”:“5m”,“queue_size”:-1},“get”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“bulk”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:50},“snapshot”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1}},“transport”:{“bound_address”:[“127.0.0.1:9300”,“10.151.70.72:9300”],“publish_address”:“10.151.70.72:9300”,“profiles”:{}},“http”:{“bound_address”:[“127.0.0.1:9200”,“10.151.70.72:9200”],“publish_address”:“10.151.70.72:9200”,“max_content_length_in_bytes”:104857600},“plugins”:,“modules”:[{“name”:“lang-expression”,“version”:“2.4.5”,“description”:“Lucene expressions integration for Elasticsearch”,“jvm”:true,“classname”:“org.elasticsearch.script.expression.ExpressionPlugin”,“isolated”:true,“site”:false},{“name”:“lang-groovy”,“version”:“2.4.5”,“description”:“Groovy scripting integration for Elasticsearch”,“jvm”:true,“classname”:“org.elasticsearch.script.groovy.GroovyPlugin”,“isolated”:true,“site”:false},{“name”:“reindex”,“version”:“2.4.5”,“description”:“_reindex and _update_by_query APIs”,“jvm”:true,“classname”:“org.elasticsearch.index.reindex.ReindexPlugin”,“isolated”:true,“site”:false}]},“Q1IlMWm6QFW9ID6zc__y6w”:{“name”:“elastic1”,“transport_address”:“10.151.70.10:9300”,“host”:“elastic1”,“ip”:“10.151.70.10”,“version”:“2.4.5”,“build”:“c849dd1”,“http_address”:“10.151.70.10:9200”,“attributes”:{“master”:“true”},“settings”:{“pidfile”:“/var/run/elasticsearch/elasticsearch.pid”,“cluster”:{“name”:“graylog”},“node”:{“data”:“true”,“name”:“elastic1”,“master”:“true”},“path”:{“conf”:“/etc/elasticsearch”,“data”:“/data/elasticsearch/data”,“logs”:“/data/elasticsearch/logs”,“work”:“/data/elasticsearch/work”,“home”:“/usr/share/elasticsearch”},“discovery”:{“zen”:{“minimum_master_nodes”:“2”,“ping”:{“multicast”:{“enabled”:“false”},“unicast”:{“hosts”:[“elastic1”,“elastic2”,“elastic3”]}}}},“name”:“elastic1”,“client”:{“type”:“node”},“http”:{“port”:“9200”},“bootstrap”:{“mlockall”:“true”},“transport”:{“tcp”:{“port”:“9300”}},“config”:{“ignore_system_properties”:“true”},“network”:{“host”:[“elastic1”,“127.0.0.1”]}},“os”:{“refresh_interval_in_millis”:1000,“name”:“Linux”,“arch”:“amd64”,“version”:“3.10.0-514.26.2.el7.x86_64”,“available_processors”:8,“allocated_processors”:8},“process”:{“refresh_interval_in_millis”:1000,“id”:59983,“mlockall”:true},“jvm”:{“pid”:59983,“version”:“1.8.0_141”,“vm_name”:“OpenJDK 64-Bit Server VM”,“vm_version”:“25.141-b16”,“vm_vendor”:“Oracle Corporation”,“start_time_in_millis”:1507730382142,“mem”:{“heap_init_in_bytes”:5368709120,“heap_max_in_bytes”:5298978816,“non_heap_init_in_bytes”:2555904,“non_heap_max_in_bytes”:0,“direct_max_in_bytes”:5298978816},“gc_collectors”:[“ParNew”,“ConcurrentMarkSweep”],“memory_pools”:[“Code Cache”,“Metaspace”,“Compressed Class Space”,“Par Eden Space”,“Par Survivor Space”,“CMS Old Gen”],“using_compressed_ordinary_object_pointers”:“true”},“thread_pool”:{“force_merge”:{“type”:“fixed”,“min”:1,“max”:1,“queue_size”:-1},“percolate”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“fetch_shard_started”:{“type”:“scaling”,“min”:1,“max”:16,“keep_alive”:“5m”,“queue_size”:-1},“listener”:{“type”:“fixed”,“min”:4,“max”:4,“queue_size”:-1},“index”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:200},“refresh”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“suggest”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“generic”:{“type”:“cached”,“keep_alive”:“30s”,“queue_size”:-1},“warmer”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“search”:{“type”:“fixed”,“min”:13,“max”:13,“queue_size”:1000},“flush”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“fetch_shard_store”:{“type”:“scaling”,“min”:1,“max”:16,“keep_alive”:“5m”,“queue_size”:-1},“management”:{“type”:“scaling”,“min”:1,“max”:5,“keep_alive”:“5m”,“queue_size”:-1},“get”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“bulk”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:50},“snapshot”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1}},“transport”:{“bound_address”:[“127.0.0.1:9300”,“10.151.70.10:9300”],“publish_address”:“10.151.70.10:9300”,“profiles”:{}},“http”:{“bound_address”:[“127.0.0.1:9200”,“10.151.70.10:9200”],“publish_address”:“10.151.70.10:9200”,“max_content_length_in_bytes”:104857600},“plugins”:,“modules”:[{“name”:“lang-expression”,“version”:“2.4.5”,“description”:“Lucene expressions integration for Elasticsearch”,“jvm”:true,“classname”:“org.elasticsearch.script.expression.ExpressionPlugin”,“isolated”:true,“site”:false},{“name”:“lang-groovy”,“version”:“2.4.5”,“description”:“Groovy scripting integration for Elasticsearch”,“jvm”:true,“classname”:“org.elasticsearch.script.groovy.GroovyPlugin”,“isolated”:true,“site”:false},{“name”:“reindex”,“version”:“2.4.5”,“description”:“_reindex and _update_by_query APIs”,“jvm”:true,“classname”:“org.elasticsearch.index.reindex.ReindexPlugin”,“isolated”:true,“site”:false}]},“6lxrVLTwRKCcyva6uW5rSw”:{“name”:“elastic2”,“transport_address”:“10.151.70.22:9300”,“host”:“elastic2”,“ip”:“10.151.70.22”,“version”:“2.4.5”,“build”:“c849dd1”,“http_address”:“10.151.70.22:9200”,“attributes”:{“master”:“true”},“settings”:{“pidfile”:“/var/run/elasticsearch/elasticsearch.pid”,“cluster”:{“name”:“graylog”},“node”:{“data”:“true”,“name”:“elastic2”,“master”:“true”},“path”:{“conf”:“/etc/elasticsearch”,“data”:“/data/elasticsearch/data”,“logs”:“/data/elasticsearch/logs”,“work”:“/data/elasticsearch/work”,“home”:“/usr/share/elasticsearch”},“discovery”:{“zen”:{“minimum_master_nodes”:“2”,“ping”:{“multicast”:{“enabled”:“false”},“unicast”:{“hosts”:[“elastic1”,“elastic2”,“elastic3”]}}}},“name”:“elastic2”,“client”:{“type”:“node”},“http”:{“port”:“9200”},“bootstrap”:{“mlockall”:“true”},“transport”:{“tcp”:{“port”:“9300”}},“config”:{“ignore_system_properties”:“true”},“network”:{“host”:[“elastic2”,“127.0.0.1”]}},“os”:{“refresh_interval_in_millis”:1000,“name”:“Linux”,“arch”:“amd64”,“version”:“3.10.0-514.26.2.el7.x86_64”,“available_processors”:8,“allocated_processors”:8},“process”:{“refresh_interval_in_millis”:1000,“id”:26904,“mlockall”:true},“jvm”:{“pid”:26904,“version”:“1.8.0_141”,“vm_name”:“OpenJDK 64-Bit Server VM”,“vm_version”:“25.141-b16”,“vm_vendor”:“Oracle Corporation”,“start_time_in_millis”:1507730385375,“mem”:{“heap_init_in_bytes”:5368709120,“heap_max_in_bytes”:5298978816,“non_heap_init_in_bytes”:2555904,“non_heap_max_in_bytes”:0,“direct_max_in_bytes”:5298978816},“gc_collectors”:[“ParNew”,“ConcurrentMarkSweep”],“memory_pools”:[“Code Cache”,“Metaspace”,“Compressed Class Space”,“Par Eden Space”,“Par Survivor Space”,“CMS Old Gen”],“using_compressed_ordinary_object_pointers”:“true”},“thread_pool”:{“force_merge”:{“type”:“fixed”,“min”:1,“max”:1,“queue_size”:-1},“percolate”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“fetch_shard_started”:{“type”:“scaling”,“min”:1,“max”:16,“keep_alive”:“5m”,“queue_size”:-1},“listener”:{“type”:“fixed”,“min”:4,“max”:4,“queue_size”:-1},“index”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:200},“refresh”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“suggest”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“generic”:{“type”:“cached”,“keep_alive”:“30s”,“queue_size”:-1},“warmer”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“search”:{“type”:“fixed”,“min”:13,“max”:13,“queue_size”:1000},“flush”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1},“fetch_shard_store”:{“type”:“scaling”,“min”:1,“max”:16,“keep_alive”:“5m”,“queue_size”:-1},“management”:{“type”:“scaling”,“min”:1,“max”:5,“keep_alive”:“5m”,“queue_size”:-1},“get”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:1000},“bulk”:{“type”:“fixed”,“min”:8,“max”:8,“queue_size”:50},“snapshot”:{“type”:“scaling”,“min”:1,“max”:4,“keep_alive”:“5m”,“queue_size”:-1}},“transport”:{“bound_address”:[“10.151.70.22:9300”,“127.0.0.1:9300”],“publish_address”:“10.151.70.22:9300”,“profiles”:{}},“http”:{“bound_address”:[“10.151.70.22:9200”,“127.0.0.1:9200”],“publish_address”:“10.151.70.22:9200”,“max_content_length_in_bytes”:104857600},“plugins”:,“modules”:[{“name”:“lang-expression”,“version”:“2.4.5”,“description”:“Lucene expressions integration for Elasticsearch”,“jvm”:true,“classname”:“org.elasticsearch.script.expression.ExpressionPlugin”,“isolated”:true,“site”:false},{“name”:“lang-groovy”,“version”:“2.4.5”,“description”:“Groovy scripting integration for Elasticsearch”,“jvm”:true,“classname”:“org.elasticsearch.script.groovy.GroovyPlugin”,“isolated”:true,“site”:false},{“name”:“reindex”,“version”:“2.4.5”,“description”:“_reindex and _update_by_query APIs”,“jvm”:true,“classname”:“org.elasticsearch.index.reindex.ReindexPlugin”,“isolated”:true,“site”:false}]}}}

Graylog debug:

2017-10-18T14:49:40.377+02:00 DEBUG [KafkaJournal] Wrote 1 messages to journal: 813 bytes (payload 771 bytes), log position 3332640874 to 3332640874
2017-10-18T14:49:40.377+02:00 DEBUG [JournallingMessageHandler] Processed batch, wrote 771 bytes, last journal offset: 3332640874, signalling reader.
2017-10-18T14:49:40.381+02:00 DEBUG [JournallingMessageHandler] End of batch, journalling 2 messages
2017-10-18T14:49:40.381+02:00 DEBUG [KafkaJournal] Trying to write ByteBufferMessageSet with size of 1888 bytes to journal
2017-10-18T14:49:40.381+02:00 DEBUG [KafkaJournal] Wrote 2 messages to journal: 1888 bytes (payload 1804 bytes), log position 3332640875 to 3332640876
2017-10-18T14:49:40.381+02:00 DEBUG [JournallingMessageHandler] Processed batch, wrote 1804 bytes, last journal offset: 3332640876, signalling reader.
2017-10-18T14:49:40.382+02:00 DEBUG [JournallingMessageHandler] End of batch, journalling 1 messages
2017-10-18T14:49:40.382+02:00 DEBUG [KafkaJournal] Trying to write ByteBufferMessageSet with size of 924 bytes to journal
2017-10-18T14:49:40.382+02:00 DEBUG [KafkaJournal] Wrote 1 messages to journal: 924 bytes (payload 882 bytes), log position 3332640877 to 3332640877
2017-10-18T14:49:40.382+02:00 DEBUG [JournallingMessageHandler] Processed batch, wrote 882 bytes, last journal offset: 3332640877, signalling reader.
2017-10-18T14:49:40.383+02:00 DEBUG [JournallingMessageHandler] End of batch, journalling 1 messages
2017-10-18T14:49:40.383+02:00 DEBUG [KafkaJournal] Trying to write ByteBufferMessageSet with size of 834 bytes to journal
2017-10-18T14:49:40.383+02:00 DEBUG [KafkaJournal] Wrote 1 messages to journal: 834 bytes (payload 792 bytes), log position 3332640878 to 3332640878
2017-10-18T14:49:40.383+02:00 DEBUG [JournallingMessageHandler] Processed batch, wrote 792 bytes, last journal offset: 3332640878, signalling reader.
2017-10-18T14:49:40.389+02:00 DEBUG [JournallingMessageHandler] End of batch, journalling 1 messages
2017-10-18T14:49:40.389+02:00 DEBUG [KafkaJournal] Trying to write ByteBufferMessageSet with size of 337 bytes to journal
2017-10-18T14:49:40.389+02:00 DEBUG [KafkaJournal] Wrote 1 messages to journal: 337 bytes (payload 295 bytes), log position 3332640879 to 3332640879
2017-10-18T14:49:40.389+02:00 DEBUG [JournallingMessageHandler] Processed batch, wrote 295 bytes, last journal offset: 3332640879, signalling reader.
2017-10-18T14:49:40.390+02:00 DEBUG [JournallingMessageHandler] End of batch, journalling 1 messages
2017-10-18T14:49:40.390+02:00 DEBUG [KafkaJournal] Trying to write ByteBufferMessageSet with size of 429 bytes to journal
2017-10-18T14:49:40.390+02:00 DEBUG [KafkaJournal] Wrote 1 messages to journal: 429 bytes (payload 387 bytes), log position 3332640880 to 3332640880
2017-10-18T14:49:40.390+02:00 DEBUG [JournallingMessageHandler] Processed batch, wrote 387 bytes, last journal offset: 3332640880, signalling reader.
2017-10-18T14:49:40.390+02:00 DEBUG [JournallingMessageHandler] End of batch, journalling 3 messages
2017-10-18T14:49:40.390+02:00 DEBUG [KafkaJournal] Trying to write ByteBufferMessageSet with size of 1355 bytes to journal
2017-10-18T14:49:40.390+02:00 DEBUG [KafkaJournal] Wrote 3 messages to journal: 1355 bytes (payload 1229 bytes), log position 3332640881 to 3332640883
2017-10-18T14:49:40.390+02:00 DEBUG [JournallingMessageHandler] Processed batch, wrote 1229 bytes, last journal offset: 3332640883, signalling reader.
2017-10-18T14:49:40.392+02:00 DEBUG [JournallingMessageHandler] End of batch, journalling 1 messages
2017-10-18T14:49:40.392+02:00 DEBUG [KafkaJournal] Trying to write ByteBufferMessageSet with size of 825 bytes to journal

Restart of graylog server solved the problem for now.

jtkarvo · October 18, 2017, 5:47pm

Is your mongodb healthy when processing stops?

madi · October 19, 2017, 6:33am

I do nothing with mongodb, logs from mongodb:

2017-10-18T12:43:24.331+0200 I COMMAND [conn246] command graylog.$cmd command: createIndexes { createIndexes: “notifications”, indexes: [ { key: { type: 1 }, name: “type_1”, ns: “graylog.notifications” } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:149 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 141245 } } } protocol:op_query 267ms
2017-10-18T12:43:24.332+0200 I COMMAND [conn241] command graylog.$cmd command: createIndexes { createIndexes: “users”, indexes: [ { key: { roles: 1 }, name: “roles_1”, ns: “graylog.users” } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:149 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 276684 } } } protocol:op_query 278ms
2017-10-18T12:43:24.330+0200 I COMMAND [conn248] command graylog.$cmd command: createIndexes { createIndexes: “notifications”, indexes: [ { key: { type: 1 }, name: “type_1”, ns: “graylog.notifications” } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:149 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 290658 } } } protocol:op_query 304ms
2017-10-18T12:43:24.330+0200 I COMMAND [conn253] command graylog.$cmd command: createIndexes { createIndexes: “users”, indexes: [ { key: { roles: 1 }, name: “roles_1”, ns: “graylog.users” } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:149 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 278768 } } } protocol:op_query 281ms
2017-10-18T12:43:24.343+0200 I COMMAND [conn237] command graylog.$cmd command: createIndexes { createIndexes: “notifications”, indexes: [ { key: { type: 1 }, name: “type_1”, ns: “graylog.notifications” } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:149 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 150163 } } } protocol:op_query 274ms
2017-10-18T12:43:24.349+0200 I COMMAND [conn247] command graylog.roles command: find { find: “roles”, filter: {} } planSummary: COLLSCAN keysExamined:0 docsExamined:8 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:8 reslen:4838 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 120051 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 120ms
2017-10-18T12:43:24.360+0200 I WRITE [conn249] update graylog.nodes query: { _id: ObjectId(‘59d57202eac6dd2bae6b5300’) } update: { is_master: true, hostname: “graylogserver”, last_seen: 1508323404, transport_address: “http://10.151.70.23:12900/api/”, _id: ObjectId(‘59d57202eac6dd2bae6b5300’), type: “SERVER”, node_id: “5f31a2e0-320b-4a06-83e3-ebe3ba806102” } keysExamined:1 docsExamined:1 nMatched:1 nModified:1 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 103949 } }, Collection: { acquireCount: { w: 1 } } } 108ms
2017-10-18T12:43:24.360+0200 I COMMAND [conn249] command graylog.$cmd command: update { update: “nodes”, ordered: true, updates: [ { q: { _id: ObjectId(‘59d57202eac6dd2bae6b5300’) }, u: { is_master: true, hostname: “graylogserver”, last_seen: 1508323404, transport_address: “http://10.151.70.23:12900/api/”, _id: ObjectId(‘59d57202eac6dd2bae6b5300’), type: “SERVER”, node_id: “5f31a2e0-320b-4a06-83e3-ebe3ba806102” }, upsert: true } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 103949 } }, Collection: { acquireCount: { w: 1 } } } protocol:op_query 110ms
2017-10-18T14:37:32.514+0200 I COMMAND [conn238] command graylog.index_ranges command: find { find: “index_ranges”, filter: { $or: [ { $and: [ { start: { $exists: false } }, { begin: { $lte: 1508330252374 } }, { end: { $gte: 1508329952374 } } ] }, { $and: [ { start: { $exists: false } }, { begin: { $lte: 0 } }, { end: { $gte: 0 } } ] } ] } } planSummary: IXSCAN { begin: 1, end: 1 }, IXSCAN { begin: 1, end: 1 } keysExamined:132 docsExamined:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:2 nreturned:1 reslen:243 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 }, acquireWaitCount: { r: 2 }, timeAcquiringMicros: { r: 77 } }, Collection: { acquireCount: { r: 3 } } } protocol:op_query 127ms
2017-10-18T15:37:09.933+0200 I NETWORK [conn249] end connection 127.0.0.1:45738 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn244] end connection 127.0.0.1:53692 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn246] end connection 127.0.0.1:45732 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn239] end connection 127.0.0.1:53682 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn247] end connection 127.0.0.1:45734 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn250] end connection 127.0.0.1:45740 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn240] end connection 127.0.0.1:53684 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn242] end connection 127.0.0.1:53688 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn238] end connection 127.0.0.1:53680 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn251] end connection 127.0.0.1:45742 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn248] end connection 127.0.0.1:45736 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn237] end connection 127.0.0.1:53678 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn253] end connection 127.0.0.1:45746 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn241] end connection 127.0.0.1:53686 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn252] end connection 127.0.0.1:45744 (17 connections now open)
2017-10-18T15:37:09.933+0200 I NETWORK [conn243] end connection 127.0.0.1:53690 (17 connections now open)
2017-10-18T15:37:09.938+0200 I NETWORK [conn235] end connection 127.0.0.1:53670 (17 connections now open)
2017-10-18T15:37:09.942+0200 I NETWORK [conn236] end connection 127.0.0.1:53672 (17 connections now open)
2017-10-18T15:37:17.885+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:53160 #260 (1 connection now open)
2017-10-18T15:37:17.936+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:53162 #261 (2 connections now open)
2017-10-18T15:37:20.492+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:53166 #262 (3 connections now open)
2017-10-18T15:37:20.997+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:53168 #263 (4 connections now open)
2017-10-18T15:37:20.998+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:53170 #264 (5 connections now open)
2017-10-18T15:37:21.002+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:53172 #265 (6 connections now open)
2017-10-18T15:37:21.002+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:53173 #266 (7 connections now open)

madi · October 25, 2017, 1:21pm

Same problem:

nothing in graylog logs

any ideas?

[2017-10-25 15:10:56,194][DEBUG][index.engine [2017-10-25 15:11:21,823][DEBUG][indices.memory [2017-10-25 15:11:48,691][DEBUG][index.engine [2017-10-25 15:11:51,823][DEBUG][indices.memory [2017-10-25 15:12:21,824][DEBUG][indices.memory [2017-10-25 15:12:51,824][DEBUG][indices.memory [2017-10-25 15:13:14,449][DEBUG][index.engine [2017-10-25 15:13:21,825][DEBUG][indices.memory [2017-10-25 15:13:51,825][DEBUG][indices.memory [2017-10-25 15:14:21,825][DEBUG][indices.memory [2017-10-25 15:14:51,826][DEBUG][indices.memory [2017-10-25 15:15:21,826][DEBUG][indices.memory [2017-10-25 15:15:51,827][DEBUG][indices.memory [2017-10-25 15:16:21,827][DEBUG][indices.memory [2017-10-25 15:16:51,828][DEBUG][indices.memory [2017-10-25 15:17:21,828][DEBUG][indices.memory [2017-10-25 15:17:51,829][DEBUG][indices.memory [2017-10-25 15:18:21,829][DEBUG][indices.memory [2017-10-25 15:18:51,830][DEBUG][index.shard [2017-10-25 15:18:51,830][DEBUG][index.shard [2017-10-25 15:18:52,106][DEBUG][index.engine ] [elastic1] [graylog_337][1] merge segment [_2x3] done: took [2.8m], [835.1 MB], [566,251 docs], [0s stopped], [2.5m throttled], [815.7 MB written], [5.0 MB/sec throttle]
] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
] [elastic1] [graylog_337][1] merge segment [_32m] done: took [37.4s], [96.1 MB], [61,641 docs], [0s stopped], [35.2s throttled], [182.9 MB written], [5.0 MB/sec throttle]
] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
] [elastic1] [graylog_337][1] merge segment [_35e] done: took [35.5s], [91.3 MB], [57,666 docs], [0s stopped], [33.4s throttled], [173.9 MB written], [5.0 MB/sec throttle]
] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
] [elastic1] recalculating shard indexing buffer, total is [505.3mb] with [1] active shards, each shard set to indexing=[505.3mb], translog=[64kb]
] [elastic1] [graylog_337][1] updating index_buffer_size from [505.3mb] to [500kb]; IndexWriter now using [0] bytes
] [elastic1] [graylog_337][1] marking shard as inactive (inactive_time=[300000000000]) indexing wise
] [elastic1] [graylog_337][1] successfully sync committed. sync id [AV9Tr4LEkONQILMAHudJ].

I have a thread dump but cant upload here

madi · November 7, 2017, 2:58pm

Ok, problem occurred again:

What additional info is needed to analyze that?

I have x-pack on my ES cluster, everything is green.

“processbufferprocessor-0” id=40 state=RUNNABLE
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3776)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4260)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3777)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$Start.match(Pattern.java:3461)
at java.util.regex.Matcher.search(Matcher.java:1248)
at java.util.regex.Matcher.find(Matcher.java:637)
at org.graylog.plugins.pipelineprocessor.functions.strings.RegexMatch.evaluate(RegexMatch.java:63)
at org.graylog.plugins.pipelineprocessor.functions.strings.RegexMatch.evaluate(RegexMatch.java:37)
at org.graylog.plugins.pipelineprocessor.ast.expressions.FunctionExpression.evaluateUnsafe(FunctionExpression.java:63)
at org.graylog.plugins.pipelineprocessor.ast.expressions.Expression.evaluate(Expression.java:41)
at org.graylog.plugins.pipelineprocessor.ast.statements.VarAssignStatement.evaluate(VarAssignStatement.java:33)
at org.graylog.plugins.pipelineprocessor.ast.statements.VarAssignStatement.evaluate(VarAssignStatement.java:22)
at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.evaluateStatement(PipelineInterpreter.java:377)
at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.executeRuleActions(PipelineInterpreter.java:364)
at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.evaluateStage(PipelineInterpreter.java:305)
at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.processForResolvedPipelines(PipelineInterpreter.java:263)
at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.process(PipelineInterpreter.java:143)
at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.process(PipelineInterpreter.java:99)
at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.handleMessage(ProcessBufferProcessor.java:114)
at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.dispatchMessage(ProcessBufferProcessor.java:100)
at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:77)
at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:42)
at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143)
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66)
at java.lang.Thread.run(Thread.java:748)

“processbufferprocessor-1” id=41 state=WAITING
- waiting on <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
- locked <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at com.lmax.disruptor.BlockingWaitStrategy.waitFor(BlockingWaitStrategy.java:45)
at com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:148)
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66)
at java.lang.Thread.run(Thread.java:748)

“processbufferprocessor-2” id=42 state=WAITING
- waiting on <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
- locked <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at com.lmax.disruptor.BlockingWaitStrategy.waitFor(BlockingWaitStrategy.java:45)
at com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:148)
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66)
at java.lang.Thread.run(Thread.java:748)

“processbufferprocessor-3” id=43 state=WAITING
- waiting on <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
- locked <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at com.lmax.disruptor.BlockingWaitStrategy.waitFor(BlockingWaitStrategy.java:45)
at com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:148)
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66)
at java.lang.Thread.run(Thread.java:748)

“processbufferprocessor-4” id=44 state=WAITING
- waiting on <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
- locked <0x5422bb12> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at com.lmax.disruptor.BlockingWaitStrategy.waitFor(BlockingWaitStrategy.java:45)
at com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:148)
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66)
at java.lang.Thread.run(Thread.java:748)

madi · November 8, 2017, 10:18am

Looks like all output thread are waiting:

Looks like they are hung for some reason?

How can i debug this?

Restarting the server makes him output again

jochen · November 8, 2017, 10:47am

For reference:
https://github.com/Graylog2/graylog2-server/issues/4326

davidoff · November 9, 2017, 11:38am

hi all, i have the same problem right now, so restart of one node helps.
unprocessed messages growed up, cluster was green, everything was ok , but messages didn`t get to the ES.

graylog ver 2.2.3

system · November 23, 2017, 11:39am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog doesn't process messages anymore, ELS cluster is green Graylog Central (peer support)	6	3589	April 12, 2021
Messages being fed to Graylog and apparently indexed, but they're not visible anymore Graylog Central (peer support)	32	9183	November 19, 2019
Struggling with Graylog stopping to export to Elasticsearch Graylog Central (peer support) pipeline-rules , debuggingpl	14	2059	August 5, 2021
HELP, its all stopped working! Graylog Central (peer support)	31	7865	March 27, 2017
Certain Mesages seem to stop Graylognodes Graylog Central (peer support)	17	1300	March 6, 2019

Graylog stops processing/writing messages

Related topics