I have a Graylog Cluster with 2 GL Servers and 3 Elasticsearch Servers.
I have the Problem that sometimes Graylog just stops indexing new messages into Elasticsearch although ES is online.
To fix it I usually have to restart the Elasticsearch Processes.
Sometimes it works fine for a week or two and sometimes it only works for 3 days. Reproducing this doesn’t seem possible.
Where should I look at to see what the source of this is?
Just had that error again. For whatever reason there were no Logfiles for Elasticsearch for a few days.
But there are files like these on all 3 ES servers.
I had to kill the ES Process to restart them.
-rw------- 1 elasticsearch elasticsearch 5171027968 May 14 18:24 java_pid23573.hprof
45 sounds pretty old. The current version is 131. I don’t know if a relevant bug has been fixed in between, but at least it should not make things worse.
To me this sounds like your Elasticsearch crashes and dumps, so it does not seem to be a graylog problem. Btw - I guess you use Elasticsearch 2.4 series version (compatible with Graylog) ? Have you installed any Elasticsearch plugins - removing them might also help.