Fatal error in thread


(Francisco Gomez) #1

Hi there,

I have been dealing with an odd number of issues with ES 5 and Graylog 2.3.2

“The latest error is a fatal one which causes the node to stop requests. Any help would be greatly appreciated.
[2017-12-14T01:28:34,770][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [NODE05] fatal error in thread [elasticsearch[NODE05][search][T#10]], exiting”

Config:

node.name: NODE05
node.master: true

cluster.name: graylog2
network.host: _site_
path.data: /mnt/data/elasticsearch/

bootstrap.memory_lock: true

indices.store.throttle.max_bytes_per_sec: 150mb

# Recover only after the given number of nodes have joined the cluster. Can be seen as "minimum number of nodes to attempt recovery at all".
gateway.recover_after_nodes: 1

# Time to wait for additional nodes after recover_after_nodes is met.
gateway.recover_after_time: 1m

# Inform ElasticSearch how many nodes form a full cluster. If this number is met, start up immediately.
gateway.expected_nodes: 2

node.max_local_storage_nodes: 4

thread_pool.bulk.queue_size: 5000
indices.memory.index_buffer_size: 512mb
indices.fielddata.cache.size: 512mb

xpack.security.enabled: false
xpack.monitoring.enabled: true
xpack.ml.enabled: false
xpack.graph.enabled: false

(Jochen) #2

Please provide the complete logs of that Elasticsearch node.


(Francisco Gomez) #3
root@NODE5:/var/log/elasticsearch# grep ERROR graylog2.log
[2017-12-14T01:28:34,770][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [NODE5] fatal error in thread [elasticsearch[NODE5][search][T#10]], exiting
[2017-12-14T09:18:48,144][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:18:48,150][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:18:48,152][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:18:48,160][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:18:48,167][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:24:13,754][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:24:13,768][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:24:13,773][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:24:13,775][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:24:13,781][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:24:13,785][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:45:42,260][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:45:42,265][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:45:42,266][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
[2017-12-14T09:45:42,271][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
grep WARN graylog2-2017-12-04.log | grep 01:28
[2017-12-04T00:01:28,923][WARN ][o.e.c.a.s.ShardStateAction] [NODE5] [ftp_6][1] received shard failed for shard id [[ftp_6][1]], allocation id [j7W3GkJ1THamam_UBKtMzg], primary term [2], message [mark copy as stale]
[2017-12-04T00:01:28,924][WARN ][o.e.c.a.s.ShardStateAction] [NODE5] [win_logs_3][0] received shard failed for shard id [[win_logs_3][0]], allocation id [3bmo6YbKSd6XJyraI1Ou3w], primary term [2], message [mark copy as stale]
[2017-12-04T03:01:28,388][WARN ][r.suppressed             ] path: /_cluster/health, params: {}
[2017-12-04T03:01:28,886][WARN ][r.suppressed             ] path: /.reporting-*/esqueue/_search, params: {index=.reporting-*, type=esqueue, version=true}
[2017-12-04T04:01:28,467][WARN ][r.suppressed             ] path: /_cluster/health, params: {}
[2017-12-04T05:01:28,420][WARN ][r.suppressed             ] path: /_cluster/health, params: {}
[2017-12-04T06:01:28,627][WARN ][r.suppressed             ] path: /_cluster/health, params: {}
[2017-12-04T07:01:28,750][WARN ][r.suppressed             ] path: /_cluster/health, params: {}
[2017-12-04T08:01:28,751][WARN ][r.suppressed             ] path: /_cluster/health, params: {}
[2017-12-04T09:01:28,797][WARN ][r.suppressed             ] path: /_cluster/health, params: {}
[2017-12-04T10:01:28,826][WARN ][r.suppressed             ] path: /_cluster/health, params: {}

(Jochen) #4

Please provide the complete logs of that Elasticsearch node.


(Francisco Gomez) #5

It won’t let me post the entire logs here, but I did find this:

Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulks
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.lambda$null$0(ExportBulk.java:167) ~[?:?]
	... 25 more
Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: bulk [default_local] reports failures when exporting documents
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.throwExportException(LocalBulk.java:126) ~[?:?]
	... 23 more
[2017-12-14T09:53:53,950][WARN ][o.e.x.m.e.l.LocalExporter] unexpected error while indexing monitoring document
org.elasticsearch.xpack.monitoring.exporter.ExportException: UnavailableShardsException[[.monitoring-es-6-2017.12.14][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.monitoring-es-6-2017.12.14][0]] containing [index {[.monitoring-es-6-2017.12.14][doc][AWBWKFy8wTz7DO0sSWSf], source[{"cluster_uuid":"bL0nasaMQgC9Cos-EP7d8A","timestamp":"2017-12-14T17:52:53.932Z","type":"node_stats","source_node":{"uuid":"xJZXKoioSq64_4zKCH4XGw","host":"10.50.5.91","transport_address":"10.50.5.91:9300","ip":"10.50.5.91","name":"TFGELSVMLXGES05","attributes":{}},"node_stats":{"node_id":"xJZXKoioSq64_4zKCH4XGw","node_master":false,"mlockall":true,"indices":{"docs":{"count":149128},"store":{"size_in_bytes":80306964,"throttle_time_in_millis":0},"indexing":{"index_total":19909,"index_time_in_millis":5354,"throttle_time_in_millis":0},"search":{"query_total":46,"query_time_in_millis":32},"query_cache":{"memory_size_in_bytes":0,"hit_count":0,"miss_count":0,"evictions":0},"fielddata":{"memory_size_in_bytes":0,"evictions":0},"segments":{"count":18,"memory_in_bytes":469847,"terms_memory_in_bytes":366863,"stored_fields_memory_in_bytes":33952,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":3776,"points_memory_in_bytes":1328,"doc_values_memory_in_bytes":63928,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0},"request_cache":{"memory_size_in_bytes":0,"evictions":0,"hit_count":0,"miss_count":0}},"os":{"cpu":{"load_average":{"1m":0.32,"5m":0.26,"15m":0.12}},"cgroup":{"cpuacct":{"control_group":"/system.slice/elasticsearch.service","usage_nanos":119053421008},"cpu":{"control_group":"/system.slice/elasticsearch.service","cfs_period_micros":100000,"cfs_quota_micros":-1,"stat":{"number_of_elapsed_periods":0,"number_of_times_throttled":0,"time_throttled_nanos":0}}}},"process":{"open_file_descriptors":355,"max_file_descriptors":65536,"cpu":{"percent":8}},"jvm":{"mem":{"heap_used_in_bytes":972984496,"heap_used_percent":3,"heap_max_in_bytes":25717506048},"gc":{"collectors":{"young":{"collection_count":25,"collection_time_in_millis":2494},"old":{"collection_count":1,"collection_time_in_millis":54}}}},"thread_pool":{"bulk":{"threads":6,"queue":0,"rejected":0},"generic":{"threads":9,"queue":0,"rejected":0},"get":{"threads":6,"queue":0,"rejected":0},"index":{"threads":0,"queue":0,"rejected":0},"management":{"threads":2,"queue":0,"rejected":0},"search":{"threads":10,"queue":0,"rejected":0},"watcher":{"threads":0,"queue":0,"rejected":0}},"fs":{"total":{"total_in_bytes":56009112354816,"free_in_bytes":24799473303552,"available_in_bytes":24799473303552},"data":[{"spins":"true"}]}}}]}]]]
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.lambda$throwExportException$2(LocalBulk.java:130) ~[?:?]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_151]
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[?:1.8.0_151]
	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[?:1.8.0_151]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[?:1.8.0_151]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[?:1.8.0_151]
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) ~[?:1.8.0_151]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) ~[?:1.8.0_151]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_151]
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) ~[?:1.8.0_151]
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.throwExportException(LocalBulk.java:131) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.lambda$doFlush$0(LocalBulk.java:114) ~[?:?]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:59) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:88) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:84) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.bulk.TransportBulkAction$BulkRequestModifier.lambda$wrapActionListenerIfNeeded$0(TransportBulkAction.java:583) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:59) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.finishHim(TransportBulkAction.java:389) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.onFailure(TransportBulkAction.java:384) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.TransportAction$1.onFailure(TransportAction.java:94) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.finishAsFailed(TransportReplicationAction.java:857) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retry(TransportReplicationAction.java:826) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryBecauseUnavailable(TransportReplicationAction.java:892) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryIfUnavailable(TransportReplicationAction.java:728) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:681) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onTimeout(TransportReplicationAction.java:846) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:311) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:238) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:1056) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.5.jar:5.6.5]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Caused by: org.elasticsearch.action.UnavailableShardsException: [.monitoring-es-6-2017.12.14][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.monitoring-es-6-2017.12.14][0]] containing [index {[.monitoring-es-6-2017.12.14][doc][AWBWKFy8wTz7DO0sSWSf], source[{"cluster_uuid":"bL0nasaMQgC9Cos-EP7d8A","timestamp":"2017-12-14T17:52:53.932Z","type":"node_stats","source_node":{"uuid":"xJZXKoioSq64_4zKCH4XGw","host":"10.50.5.91","transport_address":"10.50.5.91:9300","ip":"10.50.5.91","name":"TFGELSVMLXGES05","attributes":{}},"node_stats":{"node_id":"xJZXKoioSq64_4zKCH4XGw","node_master":false,"mlockall":true,"indices":{"docs":{"count":149128},"store":{"size_in_bytes":80306964,"throttle_time_in_millis":0},"indexing":{"index_total":19909,"index_time_in_millis":5354,"throttle_time_in_millis":0},"search":{"query_total":46,"query_time_in_millis":32},"query_cache":{"memory_size_in_bytes":0,"hit_count":0,"miss_count":0,"evictions":0},"fielddata":{"memory_size_in_bytes":0,"evictions":0},"segments":{"count":18,"memory_in_bytes":469847,"terms_memory_in_bytes":366863,"stored_fields_memory_in_bytes":33952,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":3776,"points_memory_in_bytes":1328,"doc_values_memory_in_bytes":63928,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0},"request_cache":{"memory_size_in_bytes":0,"evictions":0,"hit_count":0,"miss_count":0}},"os":{"cpu":{"load_average":{"1m":0.32,"5m":0.26,"15m":0.12}},"cgroup":{"cpuacct":{"control_group":"/system.slice/elasticsearch.service","usage_nanos":119053421008},"cpu":{"control_group":"/system.slice/elasticsearch.service","cfs_period_micros":100000,"cfs_quota_micros":-1,"stat":{"number_of_elapsed_periods":0,"number_of_times_throttled":0,"time_throttled_nanos":0}}}},"process":{"open_file_descriptors":355,"max_file_descriptors":65536,"cpu":{"percent":8}},"jvm":{"mem":{"heap_used_in_bytes":972984496,"heap_used_percent":3,"heap_max_in_bytes":25717506048},"gc":{"collectors":{"young":{"collection_count":25,"collection_time_in_millis":2494},"old":{"collection_count":1,"collection_time_in_millis":54}}}},"thread_pool":{"bulk":{"threads":6,"queue":0,"rejected":0},"generic":{"threads":9,"queue":0,"rejected":0},"get":{"threads":6,"queue":0,"rejected":0},"index":{"threads":0,"queue":0,"rejected":0},"management":{"threads":2,"queue":0,"rejected":0},"search":{"threads":10,"queue":0,"rejected":0},"watcher":{"threads":0,"queue":0,"rejected":0}},"fs":{"total":{"total_in_bytes":56009112354816,"free_in_bytes":24799473303552,"available_in_bytes":24799473303552},"data":[{"spins":"true"}]}}}]}]]
	... 12 more
[2017-12-14T09:53:53,953][WARN ][o.e.x.m.MonitoringService] [TFGELSVMLXGES05] monitoring execution failed
org.elasticsearch.xpack.monitoring.exporter.ExportException: Exception when closing export bulk
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$1$1.<init>(ExportBulk.java:106) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$1.onFailure(ExportBulk.java:104) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound$1.onResponse(ExportBulk.java:217) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound$1.onResponse(ExportBulk.java:211) ~[?:?]
	at org.elasticsearch.xpack.common.IteratingActionListener.onResponse(IteratingActionListener.java:108) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.lambda$null$0(ExportBulk.java:175) ~[?:?]
	at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:67) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.throwExportException(LocalBulk.java:137) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.lambda$doFlush$0(LocalBulk.java:114) ~[?:?]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:59) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:88) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:84) ~[elasticsearch-5.6.5.jar:5.6.5]
	at

(Francisco Gomez) #6

Do you think its the monitoring from x-pack causing to fail and exit?


(Jochen) #7

At least that’s a symptom of the actual error.


(Francisco Gomez) #8

I am going to try removing x-pack and see if the issue goes away.


(Francisco Gomez) #9

It’s better now, but now I got one node with failed indices.

Wonder if my merge settings need to be updated?
max number of segments for both is: 1

Error:
root@TFGELSVMLXGES06:/var/log/elasticsearch# grep ERROR graylog2.log
[2017-12-21T00:16:51,583][ERROR][o.e.i.e.InternalEngine$EngineMergeScheduler] [TFGELSVMLXGES06] [fw_ecom_12][1] failed to merge
[2017-12-21T00:36:45,036][ERROR][o.e.i.e.InternalEngine$EngineMergeScheduler] [TFGELSVMLXGES06] [ftp_log_4][0] failed to merge


(Kieulam141) #10

Hi cisco1115,
How do you remove x-pack?


(Francisco Gomez) #11

/usr/share/elasticsearch/bin/elasticsearch-plugin remove x-pack


(Francisco Gomez) #12

Yeah removing X-pack did not help. Getting same errors, hope someone can guide me on the right direction to fix this. It’s happens after about 24-48 hours of getting it back and healthy.


    graylog2-2017-12-26.log:[2017-12-26T00:23:49,069][ERROR][o.e.c.a.s.ShardStateAction] [TFGELSVMLXGES04] [fw_ecom_17][0] unexpected failure while failing shard [shard id [[fw_ecom_17][0]], allocation id [CMRhiFi2SWWJkjuSK4pzug], primary term [3], message [failed to perform indices:data/write/bulk[s] on replica [fw_ecom_17][0], node[G0YodUEgTXir_WdAwKVoGg], [R], s[STARTED], a[id=CMRhiFi2SWWJkjuSK4pzug]], failure [RemoteTransportException[[TFGELSVMLXGES04][10.50.5.93:9300][indices:data/write/bulk[s][r]]]; nested: IllegalStateException[active primary shard cannot be a replication target before  relocation hand off [fw_ecom_17][0], node[G0YodUEgTXir_WdAwKVoGg], [P], s[STARTED], a[id=CMRhiFi2SWWJkjuSK4pzug], state is [STARTED]]; ]]
graylog2-2017-12-26.log:[2017-12-26T00:36:50,413][ERROR][o.e.c.a.s.ShardStateAction] [TFGELSVMLXGES04] [f5_log_3][0] unexpected failure while failing shard [shard id [[f5_log_3][0]], allocation id [VmmMhJkcRQqm7UmpEUNCxg], primary term [7], message [failed to perform indices:data/write/bulk[s] on replica [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [R], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg]], failure [RemoteTransportException[[TFGELSVMLXGES04][10.50.5.93:9300][indices:data/write/bulk[s][r]]]; nested: IllegalStateException[active primary shard cannot be a replication target before  relocation hand off [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [P], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg], state is [STARTED]]; ]]
graylog2-2017-12-26.log:[2017-12-26T00:36:50,435][ERROR][o.e.c.a.s.ShardStateAction] [TFGELSVMLXGES04] [f5_log_3][0] unexpected failure while failing shard [shard id [[f5_log_3][0]], allocation id [VmmMhJkcRQqm7UmpEUNCxg], primary term [7], message [failed to perform indices:data/write/bulk[s] on replica [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [R], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg]], failure [RemoteTransportException[[TFGELSVMLXGES04][10.50.5.93:9300][indices:data/write/bulk[s][r]]]; nested: IllegalStateException[active primary shard cannot be a replication target before  relocation hand off [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [P], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg], state is [STARTED]]; ]]
graylog2-2017-12-26.log:[2017-12-26T00:36:50,436][ERROR][o.e.c.a.s.ShardStateAction] [TFGELSVMLXGES04] [f5_log_3][0] unexpected failure while failing shard [shard id [[f5_log_3][0]], allocation id [VmmMhJkcRQqm7UmpEUNCxg], primary term [7], message [failed to perform indices:data/write/bulk[s] on replica [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [R], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg]], failure [RemoteTransportException[[TFGELSVMLXGES04][10.50.5.93:9300][indices:data/write/bulk[s][r]]]; nested: IllegalStateException[active primary shard cannot be a replication target before  relocation hand off [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [P], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg], state is [STARTED]]; ]]
graylog2-2017-12-26.log:[2017-12-26T00:36:50,465][ERROR][o.e.c.a.s.ShardStateAction] [TFGELSVMLXGES04] [f5_log_3][0] unexpected failure while failing shard [shard id [[f5_log_3][0]], allocation id [VmmMhJkcRQqm7UmpEUNCxg], primary term [7], message [failed to perform indices:data/write/bulk[s] on replica [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [R], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg]], failure [RemoteTransportException[[TFGELSVMLXGES04][10.50.5.93:9300][indices:data/write/bulk[s][r]]]; nested: IllegalStateException[active primary shard cannot be a replication target before  relocation hand off [f5_log_3][0], node[G0YodUEgTXir_WdAwKVoGg], [P], s[STARTED], a[id=VmmMhJkcRQqm7UmpEUNCxg], state is [STARTED]]; ]]
graylog2-2017-12-26.log:[2017-12-26T00:52:34,321][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [TFGELSVMLXGES04] fatal error in thread [elasticsearch[TFGELSVMLXGES04][search][T#10]], exiting

(Jan Doberstein) #13

this kind of error I see only if Elasticsearch is not able to write the data:

  • because of storage is slow
  • no diskspace available
  • network communication problems

you should check the 3 above


(Francisco Gomez) #14

Yup, I was using a NAS drive to store the ES data. Switched it to local storage and it hasn’t failed since. Thanks for the help.


(system) #15

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.