Indexing of new messages stoppes occasionally

I have a Graylog Cluster with 2 GL Servers and 3 Elasticsearch Servers.
I have the Problem that sometimes Graylog just stops indexing new messages into Elasticsearch although ES is online.
To fix it I usually have to restart the Elasticsearch Processes.
Sometimes it works fine for a week or two and sometimes it only works for 3 days. Reproducing this doesn’t seem possible.

Where should I look at to see what the source of this is?

Any warnings or errors in the logs of your Graylog and Elasticsearch nodes (see http://docs.graylog.org/en/2.2/pages/configuration/file_location.html)?

Gotta wait until the error happens again…
It’s not actualy reproducable so it may take a while.

The stuck occurs in both GL Servers at the same time?

Also; do your servers have enough memory?

Yes both servers are stopping the processing.

The Graylog Servers have 4vCPU/8GB the Elasticsearch(3) Servers have 8vCPU/16GB each

and how much did you allocate to graylog and elasticsearch JVM:s? My guess would be for Elasticsearch max 8G, for Graylog max 2G or 3G

Graylog 4GB each
ES 8GB each

Just had that error again. For whatever reason there were no Logfiles for Elasticsearch for a few days.
But there are files like these on all 3 ES servers.
I had to kill the ES Process to restart them.
-rw------- 1 elasticsearch elasticsearch 5171027968 May 14 18:24 java_pid23573.hprof

Try this http://www.eclipse.org/mat/

Btw. Are you using the recommended Java version (Oracle Java 1.8) in the Elasticsearch nodes?

○ → java -version
java version "1.8.0_45"
Java™ SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot™ 64-Bit Server VM (build 25.45-b02, mixed mode)

45 sounds pretty old. The current version is 131. I don’t know if a relevant bug has been fixed in between, but at least it should not make things worse.

To me this sounds like your Elasticsearch crashes and dumps, so it does not seem to be a graylog problem. Btw - I guess you use Elasticsearch 2.4 series version (compatible with Graylog) ? Have you installed any Elasticsearch plugins - removing them might also help.

only elasticsarch-head.

I will try to update the JRE and see if it helps

Problem occured again. There are several messages like the following. These occur on all 3 ES Servers

New used memory 5368893176 [5gb] for data of [source] would be larger than configured breaker: 5112122572 [4.7gb], breaking

you run out of memory in the Elasticsearch nodes. Look at this article:

https://www.elastic.co/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html

got it. Will see if tweaking these setting will help with the issue.
Thanks for the info