I get same Indexing Failures in Graylog.
However, in my case it only happens when the indexes are being cycled, it would seem to clear out eventually though.
RemoteTransportException[[Taurus][10.36.20.151:9300][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[Taurus][10.36.20.151:9300][indices:data/write/bulk[s][p]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4@2677a163 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@239894fc[Running, pool size = 8, active threads = 8, queued tasks = 54, completed tasks = 568556]]];
I have few concerns when these issues happens however:
Are we actually losing data/logs when we receive these indexing failure messages? or will it retry until it’s successful?
I observed this issue happening after modifying the following setting from the default values: output_batch_size = 3000 processbuffer_processors = 8 outputbuffer_processors = 5
Would further increasing outputbuffer_processors to 8 perhaps help mitigate the failures on cycling events?
you connect with 5 processors per graylog server every second to elasticsearch to push up to 3000 messages into the cluster. during the index cycle the elasticsearch cluster is not able to keep that pace.
raising the output buffer would make it even more worse, because more worker connect to elasticsearch. Did you modify the resent_interval in elasticsearch?