Hello, I’m dealing with a malfunctioning setup of Graylog where we have too much input logs and Graylog / Elastic not able to process it in time. Due to system setup, our servers can run out of disk space which will require manual resolution of corrupted state.
I was under impression that I can just “pause processing” in the Graylog/Nodes UI and it will drop all incoming messages after filing up the journal. This indeed happens in the beginning until journal is filled (I can see that Process and Output Buffers utilization drops to 0 and current index stop growing). But once journal is filled to 100%, Process and Output Buffers restores back to 100% utilization and it looks like Graylog just continues to process input and produce outputs for Elastic which result in continuous Index grow.
I’ve also tried to stop Inputs, but this seem to cause backlogging on the Client/Filebeat. Which is expected, I guess.
What I’m trying to do is to preserve Cleint->Server logs data flow, but drop all new incoming logs on Graylog side.
That way I can make changes to configuration, resume processing and see if issue was resolve w/o constantly needed to drop indexes to avoid running out of disk.
One gotcha, we are running obsolete v3.1 of Graylog.
Seams like you need to adjust the way logs are ingested and the retention stragagy . Im assume you have one input where all the logs come in or do you have multiple inputs?
Simple way is create a Firewall policy to only allow “preserve Cleint->Server logs data flow” log though. and drop the rest, just an idea.
Were still running version 2.4 in a closed environment. Also try to reduce the amount of fields getting generated, this will help reduce volume from filling up.
Thanks for the replay @gsmith. In the environment that I’m dealing with, we have one input via beats. Our main issue is not in the processing done on Graylog but rather configuration of filebeat and our multiline custom logs that currently sent/processed as individual lines.\
Were still running version 2.4 in a closed environment. Also try to reduce the amount of fields getting generated, this will help reduce volume from filling up.
Btw, have you considered upgrading version or have you seen a good material that summarizes differences between versions? This might be the next thing I’ll be dealing with. I was hoping to avoid reading all releases notes and compiling report myself.