After adding a GELF TCP input and a collector for my first windows host, I started having trouble with “journal utilization” messages (a few million messages queued, unprocessed).
For some reason, after a successful Graylog service restart, new messages start to index for a couple hours and then stop again, as illustrated in the following image:
Below are the actions I’ve taken while trying to solve this problem:
Increased the number of CPU cores both on the Graylog and the ES nodes
Manually rotating journal files
Manually cycling the deflector
Restarting services (and servers)
Removing all of my collectors
Removing GELF TCP input
1 Graylog + MongoDB node (16GB, 16 cores)
1 Elasticsearch node (16GB, 16 cores)
~100 msg/sec income
I can’t get new messages to process, even though there are no errors present on the logs.
Has anyone ever had a similar problem? I’m not very sure about the root cause of all of this.