There are Graylog nodes on which the garbage collector runs too long. Garbage collection runs should be as short as possible. Please check whether those nodes are healthy. (Node: 89a7651b-46ca-4879-9543-5cbeb84804e6, GC duration: 1369 ms, GC threshold: 1000 ms)
What is the best way to resolve this? I have a single node that doing, generally under 1000 logs/s. They system that it sits on has maxed the ram that can be allocated to elastic search (32GB), and then another 32GB allocated to graylog, and then a bunch is ram that was left over for the system.
Best guess is that there was an influx of messages that comes in from our filebeat monitor. This isnāt going to go away, and like will happen again in the future. Is there a way to change the alert to a greater number?
your RAM-distibution is not ideal: if you increase your HEAP for a single java process to 32 GB it will do itās own āmagicā and be almost unusable because of internal constrains.
on a 64GB RAM machine I would go for 20GB Graylog, 20GB Elastic/Opensearch and the rest for the OS and caches by the OS. Those are important!
From my experience Graylog works much better, if you have multiple machines up to 32GB of ram with 16GB application. Graylog and Elastic can very well be separated on different machines, and be scaled horizontal.
The machine has 128GB of ram. Would you still make the same distribution of ram?
While Iād love to have multiple machines right now, there are a few things prohibiting that. Best I can do is to get this machine working and then expand from here.
This was very helpful to me - I was very occasionally getting the garbage collector error and bumped up the RAM a bit for the JVM. Hopefully it helps, but I really didnāt see any performance issues before.
So your saying to bring it up above 30GB, say to 40GB? Would this possibly cause any issues with elasticsearch, as elasticsearch will still be stuck at 32?