Fatal error in thread

Hi there,

I have been dealing with an odd number of issues with ES 5 and Graylog 2.3.2

“The latest error is a fatal one which causes the node to stop requests. Any help would be greatly appreciated.
[2017-12-14T01:28:34,770][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [NODE05] fatal error in thread [elasticsearch[NODE05][search][T#10]], exiting”

Config:

node.name: NODE05
node.master: true

cluster.name: graylog2
network.host: _site_
path.data: /mnt/data/elasticsearch/

bootstrap.memory_lock: true

indices.store.throttle.max_bytes_per_sec: 150mb

# Recover only after the given number of nodes have joined the cluster. Can be seen as "minimum number of nodes to attempt recovery at all".
gateway.recover_after_nodes: 1

# Time to wait for additional nodes after recover_after_nodes is met.
gateway.recover_after_time: 1m

# Inform ElasticSearch how many nodes form a full cluster. If this number is met, start up immediately.
gateway.expected_nodes: 2

node.max_local_storage_nodes: 4

thread_pool.bulk.queue_size: 5000
indices.memory.index_buffer_size: 512mb
indices.fielddata.cache.size: 512mb

xpack.security.enabled: false
xpack.monitoring.enabled: true
xpack.ml.enabled: false
xpack.graph.enabled: false

Please provide the complete logs of that Elasticsearch node.

root@NODE5:/var/log/elasticsearch# grep ERROR graylog2.log
[2017-12-14T01:28:34,770][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [NODE5] fatal error in thread [elasticsearch[NODE5][search][T#10]], exiting
[2017-12-14T09:18:48,144][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:18:48,150][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:18:48,152][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:18:48,160][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:18:48,167][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:24:13,754][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:24:13,768][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:24:13,773][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:24:13,775][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:24:13,781][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:24:13,785][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:45:42,260][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:45:42,265][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:45:42,266][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:45:42,271][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
grep WARN graylog2-2017-12-04.log | grep 01:28
[2017-12-04T00:01:28,923][WARN ][o.e.c.a.s.ShardStateAction] [NODE5] [ftp_6][1] received shard failed for shard id [[ftp_6][1]], allocation id [j7W3GkJ1THamam_UBKtMzg], primary term [2], message [mark copy as stale]
[2017-12-04T00:01:28,924][WARN ][o.e.c.a.s.ShardStateAction] [NODE5] [win_logs_3][0] received shard failed for shard id [[win_logs_3][0]], allocation id [3bmo6YbKSd6XJyraI1Ou3w], primary term [2], message [mark copy as stale]
[2017-12-04T03:01:28,388][WARN ][r.suppressed             ] path: /_cluster/health, params: {}
[2017-12-04T03:01:28,886][WARN ][r.suppressed             ] path: /.reporting-*/esqueue/_search, params: {index=.reporting-*, type=esqueue, version=true}
[2017-12-04T04:01:28,467][WARN ][r.suppressed             ] path: /_cluster/health, params: {}
[2017-12-04T05:01:28,420][WARN ][r.suppressed             ] path: /_cluster/health, params: {}
[2017-12-04T06:01:28,627][WARN ][r.suppressed             ] path: /_cluster/health, params: {}
[2017-12-04T07:01:28,750][WARN ][r.suppressed             ] path: /_cluster/health, params: {}
[2017-12-04T08:01:28,751][WARN ][r.suppressed             ] path: /_cluster/health, params: {}
[2017-12-04T09:01:28,797][WARN ][r.suppressed             ] path: /_cluster/health, params: {}
[2017-12-04T10:01:28,826][WARN ][r.suppressed             ] path: /_cluster/health, params: {}

Please provide the complete logs of that Elasticsearch node.

It won’t let me post the entire logs here, but I did find this:

Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulks
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.lambda$null$0(ExportBulk.java:167) ~[?:?]
	... 25 more
Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: bulk [default_local] reports failures when exporting documents
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.throwExportException(LocalBulk.java:126) ~[?:?]
	... 23 more
[2017-12-14T09:53:53,950][WARN ][o.e.x.m.e.l.LocalExporter] unexpected error while indexing monitoring document
org.elasticsearch.xpack.monitoring.exporter.ExportException: UnavailableShardsException[[.monitoring-es-6-2017.12.14][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.monitoring-es-6-2017.12.14][0]] containing [index {[.monitoring-es-6-2017.12.14][doc][AWBWKFy8wTz7DO0sSWSf], source[{"cluster_uuid":"bL0nasaMQgC9Cos-EP7d8A","timestamp":"2017-12-14T17:52:53.932Z","type":"node_stats","source_node":{"uuid":"xJZXKoioSq64_4zKCH4XGw","host":"10.50.5.91","transport_address":"10.50.5.91:9300","ip":"10.50.5.91","name":"TFGELSVMLXGES05","attributes":{}},"node_stats":{"node_id":"xJZXKoioSq64_4zKCH4XGw","node_master":false,"mlockall":true,"indices":{"docs":{"count":149128},"store":{"size_in_bytes":80306964,"throttle_time_in_millis":0},"indexing":{"index_total":19909,"index_time_in_millis":5354,"throttle_time_in_millis":0},"search":{"query_total":46,"query_time_in_millis":32},"query_cache":{"memory_size_in_bytes":0,"hit_count":0,"miss_count":0,"evictions":0},"fielddata":{"memory_size_in_bytes":0,"evictions":0},"segments":{"count":18,"memory_in_bytes":469847,"terms_memory_in_bytes":366863,"stored_fields_memory_in_bytes":33952,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":3776,"points_memory_in_bytes":1328,"doc_values_memory_in_bytes":63928,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0},"request_cache":{"memory_size_in_bytes":0,"evictions":0,"hit_count":0,"miss_count":0}},"os":{"cpu":{"load_average":{"1m":0.32,"5m":0.26,"15m":0.12}},"cgroup":{"cpuacct":{"control_group":"/system.slice/elasticsearch.service","usage_nanos":119053421008},"cpu":{"control_group":"/system.slice/elasticsearch.service","cfs_period_micros":100000,"cfs_quota_micros":-1,"stat":{"number_of_elapsed_periods":0,"number_of_times_throttled":0,"time_throttled_nanos":0}}}},"process":{"open_file_descriptors":355,"max_file_descriptors":65536,"cpu":{"percent":8}},"jvm":{"mem":{"heap_used_in_bytes":972984496,"heap_used_percent":3,"heap_max_in_bytes":25717506048},"gc":{"collectors":{"young":{"collection_count":25,"collection_time_in_millis":2494},"old":{"collection_count":1,"collection_time_in_millis":54}}}},"thread_pool":{"bulk":{"threads":6,"queue":0,"rejected":0},"generic":{"threads":9,"queue":0,"rejected":0},"get":{"threads":6,"queue":0,"rejected":0},"index":{"threads":0,"queue":0,"rejected":0},"management":{"threads":2,"queue":0,"rejected":0},"search":{"threads":10,"queue":0,"rejected":0},"watcher":{"threads":0,"queue":0,"rejected":0}},"fs":{"total":{"total_in_bytes":56009112354816,"free_in_bytes":24799473303552,"available_in_bytes":24799473303552},"data":[{"spins":"true"}]}}}]}]]]
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.lambda$throwExportException$2(LocalBulk.java:130) ~[?:?]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_151]
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[?:1.8.0_151]
	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[?:1.8.0_151]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[?:1.8.0_151]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[?:1.8.0_151]
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) ~[?:1.8.0_151]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) ~[?:1.8.0_151]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_151]
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) ~[?:1.8.0_151]
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.throwExportException(LocalBulk.java:131) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.lambda$doFlush$0(LocalBulk.java:114) ~[?:?]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:59) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:88) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:84) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.bulk.TransportBulkAction$BulkRequestModifier.lambda$wrapActionListenerIfNeeded$0(TransportBulkAction.java:583) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:59) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.finishHim(TransportBulkAction.java:389) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.onFailure(TransportBulkAction.java:384) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.TransportAction$1.onFailure(TransportAction.java:94) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.finishAsFailed(TransportReplicationAction.java:857) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retry(TransportReplicationAction.java:826) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryBecauseUnavailable(TransportReplicationAction.java:892) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryIfUnavailable(TransportReplicationAction.java:728) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:681) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onTimeout(TransportReplicationAction.java:846) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:311) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:238) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:1056) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.5.jar:5.6.5]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Caused by: org.elasticsearch.action.UnavailableShardsException: [.monitoring-es-6-2017.12.14][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.monitoring-es-6-2017.12.14][0]] containing [index {[.monitoring-es-6-2017.12.14][doc][AWBWKFy8wTz7DO0sSWSf], source[{"cluster_uuid":"bL0nasaMQgC9Cos-EP7d8A","timestamp":"2017-12-14T17:52:53.932Z","type":"node_stats","source_node":{"uuid":"xJZXKoioSq64_4zKCH4XGw","host":"10.50.5.91","transport_address":"10.50.5.91:9300","ip":"10.50.5.91","name":"TFGELSVMLXGES05","attributes":{}},"node_stats":{"node_id":"xJZXKoioSq64_4zKCH4XGw","node_master":false,"mlockall":true,"indices":{"docs":{"count":149128},"store":{"size_in_bytes":80306964,"throttle_time_in_millis":0},"indexing":{"index_total":19909,"index_time_in_millis":5354,"throttle_time_in_millis":0},"search":{"query_total":46,"query_time_in_millis":32},"query_cache":{"memory_size_in_bytes":0,"hit_count":0,"miss_count":0,"evictions":0},"fielddata":{"memory_size_in_bytes":0,"evictions":0},"segments":{"count":18,"memory_in_bytes":469847,"terms_memory_in_bytes":366863,"stored_fields_memory_in_bytes":33952,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":3776,"points_memory_in_bytes":1328,"doc_values_memory_in_bytes":63928,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0},"request_cache":{"memory_size_in_bytes":0,"evictions":0,"hit_count":0,"miss_count":0}},"os":{"cpu":{"load_average":{"1m":0.32,"5m":0.26,"15m":0.12}},"cgroup":{"cpuacct":{"control_group":"/system.slice/elasticsearch.service","usage_nanos":119053421008},"cpu":{"control_group":"/system.slice/elasticsearch.service","cfs_period_micros":100000,"cfs_quota_micros":-1,"stat":{"number_of_elapsed_periods":0,"number_of_times_throttled":0,"time_throttled_nanos":0}}}},"process":{"open_file_descriptors":355,"max_file_descriptors":65536,"cpu":{"percent":8}},"jvm":{"mem":{"heap_used_in_bytes":972984496,"heap_used_percent":3,"heap_max_in_bytes":25717506048},"gc":{"collectors":{"young":{"collection_count":25,"collection_time_in_millis":2494},"old":{"collection_count":1,"collection_time_in_millis":54}}}},"thread_pool":{"bulk":{"threads":6,"queue":0,"rejected":0},"generic":{"threads":9,"queue":0,"rejected":0},"get":{"threads":6,"queue":0,"rejected":0},"index":{"threads":0,"queue":0,"rejected":0},"management":{"threads":2,"queue":0,"rejected":0},"search":{"threads":10,"queue":0,"rejected":0},"watcher":{"threads":0,"queue":0,"rejected":0}},"fs":{"total":{"total_in_bytes":56009112354816,"free_in_bytes":24799473303552,"available_in_bytes":24799473303552},"data":[{"spins":"true"}]}}}]}]]
	... 12 more
[2017-12-14T09:53:53,953][WARN ][o.e.x.m.MonitoringService] [TFGELSVMLXGES05] monitoring execution failed
org.elasticsearch.xpack.monitoring.exporter.ExportException: Exception when closing export bulk
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$1$1.<init>(ExportBulk.java:106) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$1.onFailure(ExportBulk.java:104) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound$1.onResponse(ExportBulk.java:217) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound$1.onResponse(ExportBulk.java:211) ~[?:?]
	at org.elasticsearch.xpack.common.IteratingActionListener.onResponse(IteratingActionListener.java:108) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.lambda$null$0(ExportBulk.java:175) ~[?:?]
	at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:67) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.throwExportException(LocalBulk.java:137) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.lambda$doFlush$0(LocalBulk.java:114) ~[?:?]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:59) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:88) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:84) ~[elasticsearch-5.6.5.jar:5.6.5]
	at

Do you think its the monitoring from x-pack causing to fail and exit?

At least that’s a symptom of the actual error.

I am going to try removing x-pack and see if the issue goes away.

It’s better now, but now I got one node with failed indices.

Wonder if my merge settings need to be updated?
max number of segments for both is: 1

Error:
root@TFGELSVMLXGES06:/var/log/elasticsearch# grep ERROR graylog2.log
[2017-12-21T00:16:51,583][ERROR][o.e.i.e.InternalEngine$EngineMergeScheduler] [TFGELSVMLXGES06] [fw_ecom_12][1] failed to merge
[2017-12-21T00:36:45,036][ERROR][o.e.i.e.InternalEngine$EngineMergeScheduler] [TFGELSVMLXGES06] [ftp_log_4][0] failed to merge

Hi cisco1115,
How do you remove x-pack?

/usr/share/elasticsearch/bin/elasticsearch-plugin remove x-pack

Yeah removing X-pack did not help. Getting same errors, hope someone can guide me on the right direction to fix this. It’s happens after about 24-48 hours of getting it back and healthy.


    graylog2-2017-12-26.log:[2017-12-26T00:23:49,069][ERROR][o.e.c.a.s.ShardStateAction] [TFGELSVMLXGES04] [fw_ecom_17][0] unexpected failure while failing shard [shard id [[fw_ecom_17][0]], allocation id [CMRhiFi2SWWJkjuSK4pzug], primary term [3], message [failed to perform indices:data/write/bulk[s] on replica [fw_ecom_17][0], node[G0YodUEgTXir_WdAwKVoGg], [R], s[STARTED], a[id=CMRhiFi2SWWJkjuSK4pzug]], failure [RemoteTransportException[[TFGELSVMLXGES04][10.50.5.93:9300][indices:data/write/bulk[s][r]]]; nested: IllegalStateException[active primary shard cannot be a replication target before  relocation hand off [fw_ecom_17][0], node[G0YodUEgTXir_WdAwKVoGg], [P], s[STARTED], a[id=CMRhiFi2SWWJkjuSK4pzug], state is [STARTED]]; ]]
graylog2-2017-12-26.log:[2017-12-26T00:36:50,413][ERROR][o.e.c.a.s.ShardStateAction] [TFGELSVMLXGES04] [f5_log_3][0] unexpected failure while failing shard [shard id [[f5_log_3][0]], allocation id [VmmMhJkcRQqm7UmpEUNCxg], primary term [7], message [failed to perform indices:data/write/bulk[s] on replica [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [R], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg]], failure [RemoteTransportException[[TFGELSVMLXGES04][10.50.5.93:9300][indices:data/write/bulk[s][r]]]; nested: IllegalStateException[active primary shard cannot be a replication target before  relocation hand off [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [P], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg], state is [STARTED]]; ]]
graylog2-2017-12-26.log:[2017-12-26T00:36:50,435][ERROR][o.e.c.a.s.ShardStateAction] [TFGELSVMLXGES04] [f5_log_3][0] unexpected failure while failing shard [shard id [[f5_log_3][0]], allocation id [VmmMhJkcRQqm7UmpEUNCxg], primary term [7], message [failed to perform indices:data/write/bulk[s] on replica [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [R], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg]], failure [RemoteTransportException[[TFGELSVMLXGES04][10.50.5.93:9300][indices:data/write/bulk[s][r]]]; nested: IllegalStateException[active primary shard cannot be a replication target before  relocation hand off [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [P], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg], state is [STARTED]]; ]]
graylog2-2017-12-26.log:[2017-12-26T00:36:50,436][ERROR][o.e.c.a.s.ShardStateAction] [TFGELSVMLXGES04] [f5_log_3][0] unexpected failure while failing shard [shard id [[f5_log_3][0]], allocation id [VmmMhJkcRQqm7UmpEUNCxg], primary term [7], message [failed to perform indices:data/write/bulk[s] on replica [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [R], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg]], failure [RemoteTransportException[[TFGELSVMLXGES04][10.50.5.93:9300][indices:data/write/bulk[s][r]]]; nested: IllegalStateException[active primary shard cannot be a replication target before  relocation hand off [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [P], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg], state is [STARTED]]; ]]
graylog2-2017-12-26.log:[2017-12-26T00:36:50,465][ERROR][o.e.c.a.s.ShardStateAction] [TFGELSVMLXGES04] [f5_log_3][0] unexpected failure while failing shard [shard id [[f5_log_3][0]], allocation id [VmmMhJkcRQqm7UmpEUNCxg], primary term [7], message [failed to perform indices:data/write/bulk[s] on replica [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [R], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg]], failure [RemoteTransportException[[TFGELSVMLXGES04][10.50.5.93:9300][indices:data/write/bulk[s][r]]]; nested: IllegalStateException[active primary shard cannot be a replication target before  relocation hand off [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [P], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg], state is [STARTED]]; ]]
graylog2-2017-12-26.log:[2017-12-26T00:52:34,321][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [TFGELSVMLXGES04] fatal error in thread [elasticsearch[TFGELSVMLXGES04][search][T#10]], exiting

this kind of error I see only if Elasticsearch is not able to write the data:

  • because of storage is slow
  • no diskspace available
  • network communication problems

you should check the 3 above

Yup, I was using a NAS drive to store the ES data. Switched it to local storage and it hasn’t failed since. Thanks for the help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.