I’m at a loss of how to further investigate my issue, there are other posts that are similar but not the issue I have.
I have a single Graylog 3.0.3 instances backed by a single Elasticsearch 6.8.5 node both hosted on a single VM w/ 4vcpus and 12GB of ram.
Elasticsearch has 27 indices, 770,879,751 documents, 229.4GB has 4GB of heap.
Graylog also has 4GB of heap.
For the last two weeks at there is at least once a day where the process buffer will fill up and very (but still some) logs make it through to Elasticsearch. I only have ~100 messages/minute inbound on average and can flush up to 4500msg/sec to Elasticsearch.
There are no log entries that show that any errors have taken place on either Graylog or Elasticsearch even when I turn up the debugging level.
I have a single input with seven grok extractors, at no time I’ve seen that maximum process time grow to show that there is a timeout. I have a pipeline that corrects the timezone but again no error. If there is a specific metric I can monitor to identify if a grok or pipeline is my issue please tell me.
If you need any more info please do ask.
Edit fixed memory assigned, added Elasticsearch usage.