I am running on graylog 3.1.2 and elasticsearch cluster 6.8.3 in kubernetes in aws. The data are mounted on SC1 EBS.
The specs are as follows:
Master only (3x) : 2GB 1 CPU - 1GB Heap
Data and ingest (2x) : 24GB 6 CPU - 12GB Heap
Graylog (1x) 12GB 6CPU - 6GB Heap
On a good day, Graylog seems to be pushing 7-12k/s messages.
However after a few days of this, it drops to 1k for every few seconds to a minutes, and my messages start to queue up.
On elasticsearch side (utilizing cerebro for stats), cluster is green, but load on one of the nodes is really high. Heap is not maxed, cpu and ram are barely past 5%. Restarting pod doesnt do anything, however, if i shutdown the node, and a brand new ec2 instance comes up in its place, each of the data nodes are start fresh, i hit the 12k/s again.
Any ideas ?