Index error and journal filling up on Graylog OVA

Mic · April 12, 2017, 6:39am

Hi there ,
Since a flood of incoming messages, processing of messages stopped.

Index error shows:

RemoteTransportException[[Cloud 9][192.168.251.20:9300][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[Cloud 9][192.168.251.20:9300][indices:data/write/bulk[s][p]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4@6d9f169a on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12e2511e[Running, pool size = 8, active threads = 8, queued tasks = 50, completed tasks = 486408]]];

The EL cluster is green

Does someone knows how to get the messages processing again?

jochen · April 12, 2017, 6:56am

Elasticsearch will start processing messages again as soon as it has completed the queued tasks and has capacity to process new tasks.

janc · July 27, 2017, 1:53am

Hello,

I get same Indexing Failures in Graylog.
However, in my case it only happens when the indexes are being cycled, it would seem to clear out eventually though.

RemoteTransportException[[Taurus][10.36.20.151:9300][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[Taurus][10.36.20.151:9300][indices:data/write/bulk[s][p]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4@2677a163 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@239894fc[Running, pool size = 8, active threads = 8, queued tasks = 54, completed tasks = 568556]]];

I have few concerns when these issues happens however:

Are we actually losing data/logs when we receive these indexing failure messages? or will it retry until it’s successful?
I observed this issue happening after modifying the following setting from the default values:
output_batch_size = 3000
processbuffer_processors = 8
outputbuffer_processors = 5
Would further increasing outputbuffer_processors to 8 perhaps help mitigate the failures on cycling events?

Thanks

jtkarvo · July 27, 2017, 5:14am

Check that the Elasticsearch cluster has enough nodes and memory in the nodes.

jochen · July 27, 2017, 6:14am

You have to find out why tasks in Elasticsearch are piling up and fix this issue.

jan · July 28, 2017, 10:46am

Just to explain:

you connect with 5 processors per graylog server every second to elasticsearch to push up to 3000 messages into the cluster. during the index cycle the elasticsearch cluster is not able to keep that pace.

raising the output buffer would make it even more worse, because more worker connect to elasticsearch. Did you modify the resent_interval in elasticsearch?

janc · August 8, 2017, 10:55am

Hello Jan,

Sorry for getting back to you quite late.
Are you referring to index.refresh_interval in elasticasearch? we have not modified it.

Do you have any recommendations on tuning this setting?

Thanks a lot!

janc · August 9, 2017, 1:42am

Update:

I have reduced the amount of output_buffer processors from 5 to 3.
The index cycle for this day has no data/write/bulk “Indexing Failures”

I’ll observe for a couple more days.

Topic		Replies	Views
Errors: RemoteTransportException / BlockingBatchedESOutput (org.joda.time.DateTime) Graylog Central (peer support)	1	944	July 4, 2017
Graylog failing to index data Graylog Central (peer support)	13	1415	May 25, 2021
Index error after one ES node crashed Graylog Central (peer support)	7	1841	October 19, 2021
Graylog is not processing messages to Elasticsearch Graylog Central (peer support)	5	1392	January 14, 2020
Graylog not processing messages after crash (ran out of space) Graylog Central (peer support)	5	3667	April 20, 2020

Index error and journal filling up on Graylog OVA

Related topics