High load on the elasticsearch data nodes

hi people,

my setup:
i have
4 Graylog Servers which are doing the message processing - 24 vCores, 64 gb of ram with 30 dedicated to the heap for java.
1 Graylog Master which is also the Webserver 15 vCores 32 gb of ram with 16 for java heap -
(All 5 Graylog Servers have MongoDb with one Primary)
3 Elastic Search Data Node Servers - 24 vCores, 64 gb of ram with 30 dedicated to the heap for java.
3 Elastic Search Master Node Servers - 10 vCores 32 gb of ram with 16 for java heapstrong text
every index set has a replica of 1

i have a new issue with elasticsearch/graylog.
every time i’m doing a query in graylog the load on the elasticsearch data nodes goes above 26 on the 2nd data node and more than 35 on the 3rd data node.



if there is no query most of the times the 3rd note is having a load of 30

this is very weird in my opinion because the heap size is ok, disk space too and the cpu barely gets utilized…

more than this everytime there is a high load on the 3rd node i’m having index failures on the 3rd node

6 minutes ago	row_firewall_215	551cedb5-d467-11e9-90b7-005056867a00	{"type":"es_rejected_execution_exception","reason":"rejected execution of processing of [2184091][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[row_firewall_215][2]] containing [637] requests, target allocation id: r9jkAo4HRcWrWdS_o2BKcg, primary term: 1 on EsThreadPoolExecutor[name = data-node-3/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@eb770bf[Running, pool size = 24, active threads = 24, queued tasks = 214, completed tasks = 769470]]"}
6 minutes ago	firewall_782	551cc6cf-d467-11e9-90b7-005056867a00	{"type":"es_rejected_execution_exception","reason":"rejected execution of processing of [2184088][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[firewall_782][1]] containing [1601] requests, target allocation id: y5jCsR5HSiORK2Kh2SRBXA, primary term: 1 on EsThreadPoolExecutor[name = data-node-3/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@eb770bf[Running, pool size = 24, active threads = 24, queued tasks = 213, completed tasks = 769469]]"}
6 minutes ago	row_firewall_215	551cedb0-d467-11e9-90b7-005056867a00	{"type":"es_rejected_execution_exception","reason":"rejected execution of processing of [2184091][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[row_firewall_215][2]] containing [637] requests, target allocation id: r9jkAo4HRcWrWdS_o2BKcg, primary term: 1 on EsThreadPoolExecutor[name = data-node-3/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@eb770bf[Running, pool size = 24, active threads = 24, queued tasks = 214, completed tasks = 769470]]"}
6 minutes ago	firewall_782	551cc6cb-d467-11e9-90b7-005056867a00	{"type":"es_rejected_execution_exception","reason":"rejected execution of processing of [2184088][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[firewall_782][1]] containing [1601] requests, target allocation id: y5jCsR5HSiORK2Kh2SRBXA, primary term: 1 on EsThreadPoolExecutor[name = data-node-3/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@eb770bf[Running, pool size = 24, active threads = 24, queued tasks = 213, completed tasks = 769469]]"}
6 minutes ago	firewall_782	551cc6ca-d467-11e9-90b7-005056867a00	{"type":"es_rejected_execution_exception","reason":"rejected execution of processing of [2184088][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[firewall_782][1]] containing [1601] requests, target allocation id: y5jCsR5HSiORK2Kh2SRBXA, primary term: 1 on EsThreadPoolExecutor[name = data-node-3/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@eb770bf[Running, pool size = 24, active threads = 24, queued tasks = 213, completed tasks = 769469]]"}
6 minutes ago	firewall_782	551cc6c5-d467-11e9-90b7-005056867a00	{"type":"es_rejected_execution_exception","reason":"rejected execution of processing of [2184088][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[firewall_782][1]] containing [1601] requests, target allocation id: y5jCsR5HSiORK2Kh2SRBXA, primary term: 1 on EsThreadPoolExecutor[name = data-node-3/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@eb770bf[Running, pool size = 24, active threads = 24, queued tasks = 213, completed tasks = 769469]]"}
6 minutes ago	row_firewall_215	551cc6c8-d467-11e9-90b7-005056867a00	{"type":"es_rejected_execution_exception","reason":"rejected execution of processing of [2184061][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[row_firewall_215][0]] containing [636] requests, target allocation id: ASEjD0PhT6-AGDai5iPx4w, primary term: 1 on EsThreadPoolExecutor[name = data-node-3/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@eb770bf[Running, pool size = 24, active threads = 24, queued tasks = 203, completed tasks = 769466]]"}
6 minutes ago	row_firewall_215	551cc6c9-d467-11e9-90b7-005056867a00	{"type":"es_rejected_execution_exception","reason":"rejected execution of processing of [2184091][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[row_firewall_215][2]] containing [637] requests, target allocation id: r9jkAo4HRcWrWdS_o2BKcg, primary term: 1 on EsThreadPoolExecutor[name = data-node-3/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@eb770bf[Running, pool size = 24, active threads = 24, queued tasks = 214, completed tasks = 769470]]"}
6 minutes ago	row_firewall_215	551cc6c3-d467-11e9-90b7-005056867a00	{"type":"es_rejected_execution_exception","reason":"rejected execution of processing of [2184071][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[row_firewall_215][3]] containing [619] requests, target allocation id: OKJn1c-CRw6emsR_VaoHaQ, primary term: 1 on EsThreadPoolExecutor[name = data-node-3/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@eb770bf[Running, pool size = 24, active threads = 24, queued tasks = 206, completed tasks = 769466]]"}
6 minutes ago	row_firewall_215	551cc6c2-d467-11e9-90b7-005056867a00	{"type":"es_rejected_execution_exception","reason":"rejected execution of processing of [2184091][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[row_firewall_215][2]] containing [637] requests, target allocation id: r9jkAo4HRcWrWdS_o2BKcg, primary term: 1 on EsThreadPoolExecutor[name = data-node-3/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@eb770bf[Running, pool size = 24, active threads = 24, queued tasks = 214, completed tasks = 769470]]"}

in the logs of the 3rd node i’m getting some java messages like

[2019-09-11T03:18:27,207][DEBUG][o.e.a.s.TransportSearchAction] [data-node-3] [graylog_3][3], node[bh-CSNy5SlmX7OJbmALsXg], [P], s[STARTED], a[id=bVQUsG6KRjeXcocZNq_GDg]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[graylog_3], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=false, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[message], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=15, batchedReduceSize=512, preFilterShardSize=64, allowPartialSearchResults=true, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, source={"from":0,"size":0,"query":{"bool":{"filter":[{"query_string":{"query":"EventID:4625 AND SubStatus:0xc000006a AND NOT LogonType:5 AND streams:000000000000000000000001","fields":[],"type":"best_fields","tie_breaker":0.0,"default_operator":"or","max_determinized_states":10000,"allow_leading_wildcard":true,"enable_position_increments":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expansions":50,"phrase_slop":0,"escape":false,"auto_generate_synonyms_phrase_query":true,"fuzzy_transpositions":true,"boost":1.0}},{"range":{"timestamp":{"from":"2019-09-09 15:03:54.030","to":"2019-09-09 15:08:54.030","include_lower":true,"include_upper":true,"boost":1.0}}},{"bool":{"should":[{"term":{"streams":{"value":"000000000000000000000001","boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"aggregations":{"pivot-1-series-max(TargetUserName)":{"max":{"field":"TargetUserName"}},"timestamp-min":{"min":{"field":"timestamp"}},"timestamp-max":{"max":{"field":"timestamp"}}}}}]
org.elasticsearch.transport.RemoteTransportException: [data-node-2][10.161.90.45:9300][indices:data/read/search[phase/query]]
Caused by: java.lang.IllegalArgumentException: Expected numeric type on field [TargetUserName], but got [keyword]
        at org.elasticsearch.search.aggregations.support.ValuesSourceConfig.numericField(ValuesSourceConfig.java:309) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.aggregations.support.ValuesSourceConfig.originalValuesSource(ValuesSourceConfig.java:292) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.aggregations.support.ValuesSourceConfig.toValuesSource(ValuesSourceConfig.java:249) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:55) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:216) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:217) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:55) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:112) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$17(IndicesService.java:1253) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$18(IndicesService.java:1309) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:164) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:147) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:119) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1315) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1251) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:348) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:394) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.SearchService.access$100(SearchService.java:126) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:359) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:355) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1107) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.3.jar:6.8.3]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_212]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_212]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
[2019-09-11T03:18:27,212][DEBUG][o.e.a.s.TransportSearchAction] [data-node-3] All shards failed for phase: [query]
org.elasticsearch.ElasticsearchException$1: Expected numeric type on field [TargetUserName], but got [keyword]
        at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:657) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:131) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:259) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:100) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.action.search.InitialSearchPhase.access$100(InitialSearchPhase.java:48) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.action.search.InitialSearchPhase$2.lambda$onFailure$1(InitialSearchPhase.java:220) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.action.search.InitialSearchPhase.maybeFork(InitialSearchPhase.java:174) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.action.search.InitialSearchPhase.access$000(InitialSearchPhase.java:48) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.action.search.InitialSearchPhase$2.onFailure(InitialSearchPhase.java:220) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:73) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:463) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1114) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.transport.TcpTransport.lambda$handleException$24(TcpTransport.java:1011) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193) [elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.transport.TcpTransport.handleException(TcpTransport.java:1009) [elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.transport.TcpTransport.handlerResponseError(TcpTransport.java:1001) [elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:950) [elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:763) [elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) [transport-netty4-client-6.8.3.jar:6.8.3]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
Caused by: java.lang.IllegalArgumentException: Expected numeric type on field [TargetUserName], but got [keyword]
        at org.elasticsearch.search.aggregations.support.ValuesSourceConfig.numericField(ValuesSourceConfig.java:309) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.aggregations.support.ValuesSourceConfig.originalValuesSource(ValuesSourceConfig.java:292) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.aggregations.support.ValuesSourceConfig.toValuesSource(ValuesSourceConfig.java:249) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:55) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:216) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:217) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:55) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:112) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$17(IndicesService.java:1253) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$18(IndicesService.java:1309) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:164) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:147) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:119) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1315) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1251) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:348) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:394) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.SearchService.access$100(SearchService.java:126) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:359) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:355) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1107) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.3.jar:6.8.3]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_212]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_212]
        ... 1 more

previously i was running elasticsearch 6.5.2, yesterday i have upgraded to 6.8.3 and things became more usable…
i have also re-adjusted the indices in the range of 20 - 40 GB
now every data node is managing something like 5.5 TB of data and i was thinking to do something like 4 TB/ elasticsearch data node… what do you think?

honestly i’m lost i have no idea what to do to improve the load on the 3rd node, any ideas?

thanks,
Marius.

guys, i’m dying here, now i have a new record value of 49 on the 3rd node :expressionless:

anyone any ideas?

i have re adjusted the indices size to be in the range of 20GB 30 GB, because some one said that the size of the index should not be more than half of virtual memory… and since i have 64 i went for that, the situation improved a little bit but the high load still happens when there are a lot of queries, now i have 3 users which are executing queries and the load looks like
image

and having this huge load my output buffers fill to the max and i can’t write data to elasticsearch, and naturally i also have index failures …

i guess i’m the only one which got into this issue since no one else is writing anything here :smiley:

it looks like you could get a little more cpu/additional nodes - or do better processing that your users do not neet such complex queries.

You should really dig into what your users are searching and optimize the data for that.

any idea how many TB/data node should i have for 100 users which run some basic queries?

what is a basic query for you? For me it would be _exists_ source

EventID:4625 AND SubStatus:0xc000006a