UDP packets are dropped during garbage collection

Whenever the Graylog server’s garbage collection happens, we notice UDP RcvbufErrors (we can see this from netstat -suna). See this graph which shows how the RcvBuf errors spikes during a GC cycle:

27%20AM

I tried bumping up net.core.rmem_max and net.core.netdev_max_backlog to see if it would help, but it did not help. Is there anything we can do to ensure that the GC pauses do not lead to the UDP packets being dropped?

I guess that it would help to reduce the time of the garbage collection - Did you have a reason for this hudge HEAP for Graylog?

@jan I don’t follow what you mean by “hudge HEAP”. These are the JVM options used:

-Djava.net.preferIPv4Stack=true -Xms7519m -Xmx7519m -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow

How can I reduce the GC time?

The RAM you have assigned to Graylog JVM HEAP (7GB) - did you have that for a reason?

Generally, for JVM applications, we tend to assign half the total RAM available. For e.g. this is a 16 GB box, so assigned roughly half of it. We’re ingesting 1,000 to 1,500 messages per second on the Graylog boxes.

Would a smaller heap result in more frequent GC, hence leading to smaller pauses? What is the recommended size?

if you have no special needs in a big HEAP for Graylog - I would not use more than 2GB …

@jan I tried decreasing the heap size to 2 GB. That only decreased the number of RcvbufErrors, but did not eliminate them. I still see the errors occurring when garabage collection happens.

I’m aware of the limitations of UDP, but is this an expected behavior that GC pauses can cause packet loss?

I’m aware of the limitations of UDP, but is this an expected behavior that GC pauses can cause packet loss?

As this is UDP, yes that can happen.

@kishorenc
Do you see the same UPD drops at night with lower log traffic?
What is your average traffic rate at the day and night?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.