We are continually seeing searches fail for ‘trivial’ searches (ie not asking to search months of data/etc). Simply re-running the same search normally works - so it’s a transitory issue
A sniffer shows graylog sending ES the query and it returning a HTTP 500 error “Unable to perform search query.”
Looking at the ES logs I can see
[2017-11-02 08:05:49,628][DEBUG][action.search ] [kiwi] [graylog_6485], node[LcJGzDCvThmffdACkHwcmw], [R], v, s[STARTED], a[id=C2u1X6aiTIi8xXFVlIH4NQ]: Failed to execute [org.elasticsearch.action.search.SearchRequest@6d01236f] lastShard [true] RemoteTransportException[[takahe][10.4.128.205:9300][indices:data/read/search[phase/query]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler@f9240b5 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@777d1fd1[Running, pool size = 49, active threads = 49, queued tasks = 1000, completed tasks = 89696]]];
So it looks like “the pool” was full at that moment as well as the queue tasks == queue capacity. But I don’t know what that means
Is this an indicator of a problem, or can I just increases whatever needs to be increased?
Thanks, this is GL-2.3.2 and ES-2.4.6