1. Describe your incident:
Our Graylog Cluster was crashed (due to a Pipeline modification). Cluster was not online for a longer time.
This in turn caused the Graylog Sidecar on the clients to not reach the cluster.
It seems that (probably) because of that the clients stayed within the files they monitor and prevented our application from rotating the logfile.
2. Describe your environment:
RHEL 7
Graylog version 4.3.9
ES Version 6.8.1
3. What steps have you already taken to try and solve the problem?
We now stopped all Agents. Once we understand root cause we might start the sidecar agents again.
4. How can the community help?
Did you have same problem (locked client log files due to Graylog Cluster being unavailable)? How did you solve it?
What version of Graylog Sidecar are you using? To be specific, sidecar manages collector agents, such as beats and nxlog. The answer is likely something to do with the collector.
Have a look at the discussion of the close_timeout option for filebeat. It specifically mentions your situation. The description test for the close_renamed and close_removed might also hold some information that is of interest for you.
With the filebeats we have in use, we never had a situation like yours occur.