Whenever the Graylog server’s garbage collection happens, we notice UDP RcvbufErrors (we can see this from netstat -suna). See this graph which shows how the RcvBuf errors spikes during a GC cycle:
I tried bumping up net.core.rmem_max and net.core.netdev_max_backlog to see if it would help, but it did not help. Is there anything we can do to ensure that the GC pauses do not lead to the UDP packets being dropped?
Generally, for JVM applications, we tend to assign half the total RAM available. For e.g. this is a 16 GB box, so assigned roughly half of it. We’re ingesting 1,000 to 1,500 messages per second on the Graylog boxes.
Would a smaller heap result in more frequent GC, hence leading to smaller pauses? What is the recommended size?
@jan I tried decreasing the heap size to 2 GB. That only decreased the number of RcvbufErrors, but did not eliminate them. I still see the errors occurring when garabage collection happens.
I’m aware of the limitations of UDP, but is this an expected behavior that GC pauses can cause packet loss?