Ongoing "Unable to perform search query." errors

jason · November 2, 2017, 3:14pm

Hi there

We are continually seeing searches fail for ‘trivial’ searches (ie not asking to search months of data/etc). Simply re-running the same search normally works - so it’s a transitory issue

A sniffer shows graylog sending ES the query and it returning a HTTP 500 error “Unable to perform search query.”

Looking at the ES logs I can see

[2017-11-02 08:05:49,628][DEBUG][action.search            ] [kiwi] [graylog_6485][3], node[LcJGzDCvThmffdACkHwcmw], [R], v[11], s[STARTED], a[id=C2u1X6aiTIi8xXFVlIH4NQ]: Failed to execute [org.elasticsearch.action.search.SearchRequest@6d01236f] lastShard [true]
RemoteTransportException[[takahe][10.4.128.205:9300][indices:data/read/search[phase/query]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler@f9240b5 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@777d1fd1[Running, pool size = 49, active threads = 49, queued tasks = 1000, completed tasks = 89696]]];

So it looks like “the pool” was full at that moment as well as the queue tasks == queue capacity. But I don’t know what that means

Is this an indicator of a problem, or can I just increases whatever needs to be increased?

Thanks, this is GL-2.3.2 and ES-2.4.6

Jason

jochen · November 2, 2017, 5:12pm

Your Elasticsearch cluster is operating at full capacity. While you could increase the relevant thread pools (and their queues), this would only mitigate the issue for a short time and lead to a ever increasing backlog.

You could try tuning Elasticsearch for you needs but the only good solution for this is to add more or better hardware (i. e. SSDs instead of spinning rust) to your Elasticsearch cluster.

For information about the Elasticsearch thread pools, please refer to Thread Pool | Elasticsearch Guide [2.4] | Elastic

jason · November 2, 2017, 7:39pm

Well that is weird. It’s a 4-node ES cluster with identical servers with 40 cores and 64G RAM each - with load averages down in the 2-5 range. That appears grossly over-speced to me. Admittedly the disks are 15K spinning rust - but is this really an I/O problem? I’m seeing 20MB/s writes via iotop - doesn’t seem busy to me?

Jason

jochen · November 3, 2017, 3:06pm

I don’t know. That’s something you have to investigate on your machines.

Are the disks local or are you using a SAN (which then might either be too slow over the network or you have a noisy neighbor problem)?

The hardware specs themselves don’t mean much. They might be fine for 200000 messages/second but break down at 1000000 messages/second.

Maybe you’re also just using badly tuned Elasticsearch nodes.

But all of this is nothing I can help you with. If you’re a Graylog Enterprise customer, contact support to help you with pinpointing and possibly solving the performance problems.

jtkarvo · November 4, 2017, 5:56pm

What is the number of shards? I had these, too, but reducing the number of shards (making them bigger) helped.

jason · November 6, 2017, 12:33am

We have 10G indexes with 4 shards each (and replicas=1 if it matters). Is
that big or small?

Thanks

jtkarvo · November 6, 2017, 5:49am

They are small. See https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

So you could have 20-40G shards. Now you have 2,5G shards, so you should make them 10 times bigger to be efficient.

Currently your index uses about 0,8G of ES JVM RAM each (or tries to: 4 shards + 4 copies each take 0,1G of RAM). If you multiply that number with the total number of indices you have in your index set and divide by the number of ES nodes, you get the amount of JVM in each node you need for the ES to work efficiently.

system · November 20, 2017, 5:50am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Could not execute search when using searches over 14 days Graylog Central (peer support)	6	1160	September 20, 2017
Still getting search failures under 2.3.1 Graylog Central (peer support)	4	752	September 14, 2017
Getting message “Could not execute search” Graylog Central (peer support)	5	777	March 19, 2021
ElasticSearch Error on Graylog Graylog Central (peer support)	9	1693	April 19, 2018
Unable to execute search Exception: org.elasticsearch.action.search.SearchPhaseExecutionException Graylog Central (peer support)	11	884	February 26, 2019

Ongoing "Unable to perform search query." errors

Related topics