My setup is much simpler I have a single ubuntu Hyper-V Graylog server that sends to a separate ubuntu hyper-v elasticsearch server. Both are located on the same physical server. We have backups of the virtual environment via the physical server backup and no direct business/compliance reliance on the data - it is used for internal IT security/reporting only.
The only place we match is that Graylog will stop shipping to elasticsearch randomly and a restart of Graylog (app not server) clears out the queue. we are at such a low volume that I highly doubt we are resource constrained. There is nothing in the Graylog logs… the only think I can think of is that there is a GROK that gets out of control. I have eliminated all extractors for that reason and do all my work in the pipeline but that doesn’t seem to have made a difference. I haven’t put my mind to reworking the GROK in my pipeline… I would rather wait until I see SOMETHING that would lead me to think it’s worth the time.
I have mentioned it in the forums but it is a hard issue to describe as there is no error to show for it.
So I watch the forums… and I occasionally have to restart Graylog.