Nodes with too Long GC pauses with maxed ES ram

There are Graylog nodes on which the garbage collector runs too long. Garbage collection runs should be as short as possible. Please check whether those nodes are healthy. (Node: 89a7651b-46ca-4879-9543-5cbeb84804e6, GC duration: 1369 ms, GC threshold: 1000 ms)

What is the best way to resolve this? I have a single node that doing, generally under 1000 logs/s. They system that it sits on has maxed the ram that can be allocated to elastic search (32GB), and then another 32GB allocated to graylog, and then a bunch is ram that was left over for the system.

Best guess is that there was an influx of messages that comes in from our filebeat monitor. This isnā€™t going to go away, and like will happen again in the future. Is there a way to change the alert to a greater number?

Thanks,

Chase

Hey @Chase

I found this

2 Likes

your RAM-distibution is not ideal: if you increase your HEAP for a single java process to 32 GB it will do itā€™s own ā€œmagicā€ and be almost unusable because of internal constrains.
on a 64GB RAM machine I would go for 20GB Graylog, 20GB Elastic/Opensearch and the rest for the OS and caches by the OS. Those are important!
From my experience Graylog works much better, if you have multiple machines up to 32GB of ram with 16GB application. Graylog and Elastic can very well be separated on different machines, and be scaled horizontal.

I currently have 30g allocated to it. I could do more, but we currently have it matching elastic search.

graylog-admin@graylog:~$ cat /etc/default/graylog-server

Path to a custom java executable. By default the java executable of the

bundled JVM is used.

#JAVA=/usr/bin/java

Default Java options for heap and garbage collection.

GRAYLOG_SERVER_JAVA_OPTS=ā€œ-Xms30g -Xmx30g -server -XX:+UseG1GC -XX:-OmitStackTraceInFastThrowā€

Thanks,

Chase

The machine has 128GB of ram. Would you still make the same distribution of ram?

While Iā€™d love to have multiple machines right now, there are a few things prohibiting that. Best I can do is to get this machine working and then expand from here.

Thanks,

Chase

This was very helpful to me - I was very occasionally getting the garbage collector error and bumped up the RAM a bit for the JVM. Hopefully it helps, but I really didnā€™t see any performance issues before.

So your saying to bring it up above 30GB, say to 40GB? Would this possibly cause any issues with elasticsearch, as elasticsearch will still be stuck at 32?

Thanks,

Chase

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.