Graylog Server Extremely slow and taking whole RAM capacity

Hi
We deployed the Graylog in k8s cluster. And the version of graylog is 3.0.2.
Recently writing is very slow, about 5/6 massages per second.
We found the a lot of error logs in graylog pod.

2020-03-13 07:25:04,698 WARN [ProxiedResource] - Unable to call http://10.233.110.59:9000/api/system/metrics/multiple on node <051f6d15-ac92-4fb3-af82-e21a67a12d2d> -
{}
java.net.ConnectException: Failed to connect to /10.233.110.59:9000
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:247) ~[graylog.jar:?]
at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:165) ~[graylog.jar:?]
at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257) ~[graylog.jar:?]
at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135) ~[graylog.jar:?]
at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114) ~[graylog.jar:?]
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[graylog.jar:?]
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[graylog.jar:?]
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[graylog.jar:?]
at org.graylog2.rest.RemoteInterfaceProvider.lambda$get$0(RemoteInterfaceProvider.java:61) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[graylog.jar:?]
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:200) ~[graylog.jar:?]
at okhttp3.RealCall.execute(RealCall.java:77) ~[graylog.jar:?]
at retrofit2.OkHttpCall.execute(OkHttpCall.java:180) ~[graylog.jar:?]
at org.graylog2.shared.rest.resources.ProxiedResource.lambda$getForAllNodes$0(ProxiedResource.java:78) ~[graylog.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_212]
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_212]
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_212]
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_212]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_212]
at java.net.Socket.connect(Socket.java:589) ~[?:1.8.0_212]
at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129) ~[graylog.jar:?]
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:245) ~[graylog.jar:?]

And graylog comsumd whole memory.

By the way, we set the heap size to 6GB.
Could you please give me some advice? Thanks a lot!

If you need more information, please let me know.

By the way, 10.233.110.59 is the ip of graylog master pod. In graylog coordinating pod, I used the curl -u 'user:password' http://10.233.110.59:9000/api/system/metrics/ command to test, this command could return the normal result. So I don’t know why there was connection errors.

if you assign all available RAM to Graylog - because you have configured the HEAP that much - I would advice to reduce the HEAP.

From your given information I can’t give any other advice.

We did’t assign all the RAM to Graylog. We set the memory limit to 6GB for Graylog pod. And the host machine of Graylog pod has 32GB memory.

@jan Thanks for your quick response.
We decide to update the graylog image version next week, from 3.0.2 to 3.1.0.
Let’s see if that happens again.

Thanks for your advice. We reduce the heap size to 3g, and now looks good. :grinning:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.