Hey,
I am having trouble with the metrics page of a cluster node.
When I open http://<graylog-master-IP>:9000/api/cluster/bcc504fe-f4f3-4fa6-b6d0-57d12691e5ab/metrics/namespace/org
it takes a while and I am presented an error message:
Cannot read property 'length' of undefined
Chrome dev-tools shows that it is trying to load http://<graylog-master-IP>:9000/api/cluster/bcc504fe-f4f3-4fa6-b6d0-57d12691e5ab/metrics/namespace/org
which most of the time fails after ~10 seconds with this error:
{"type":"ApiError","message":"timeout"}
Every time this happens I get the following error logged to the server log:
2020-07-17T18:30:53.527+02:00 ERROR [AnyExceptionClassMapper] Unhandled exception in REST resource
java.net.SocketTimeoutException: timeout
at okio.Okio$4.newTimeoutException(Okio.java:232) ~[graylog.jar:?]
at okio.AsyncTimeout.exit(AsyncTimeout.java:286) ~[graylog.jar:?]
at okio.AsyncTimeout$2.read(AsyncTimeout.java:241) ~[graylog.jar:?]
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:358) ~[graylog.jar:?]
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:230) ~[graylog.jar:?]
at okhttp3.internal.http1.Http1ExchangeCodec.readHeaderLine(Http1ExchangeCodec.java:242) ~[graylog.jar:?]
at okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.java:213) ~[graylog.jar:?]
at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.java:115) ~[graylog.jar:?]
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:94) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[graylog.jar:?]
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:43) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[graylog.jar:?]
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[graylog.jar:?]
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[graylog.jar:?]
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[graylog.jar:?]
at org.graylog2.rest.RemoteInterfaceProvider.lambda$get$0(RemoteInterfaceProvider.java:61) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[graylog.jar:?]
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229) ~[graylog.jar:?]
at okhttp3.RealCall.execute(RealCall.java:81) ~[graylog.jar:?]
at retrofit2.OkHttpCall.execute(OkHttpCall.java:190) ~[graylog.jar:?]
at org.graylog2.rest.resources.cluster.ClusterNodeMetricsResource.byNamespace(ClusterNodeMetricsResource.java:122) ~[graylog.jar:?]
at sun.reflect.GeneratedMethodAccessor203.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_252]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_252]
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) ~[graylog.jar:?]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144) ~[graylog.jar:?]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) ~[graylog.jar:?]
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205) ~[graylog.jar:?]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) ~[graylog.jar:?]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) ~[graylog.jar:?]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347) ~[graylog.jar:?]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) ~[graylog.jar:?]
at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors.process(Errors.java:315) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors.process(Errors.java:297) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors.process(Errors.java:267) [graylog.jar:?]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) [graylog.jar:?]
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) [graylog.jar:?]
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) [graylog.jar:?]
at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:384) [graylog.jar:?]
at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:224) [graylog.jar:?]
at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181) [graylog.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
Caused by: java.net.SocketException: Socket closed
at java.net.SocketInputStream.read(SocketInputStream.java:204) ~[?:1.8.0_252]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_252]
at okio.Okio$2.read(Okio.java:140) ~[graylog.jar:?]
at okio.AsyncTimeout$2.read(AsyncTimeout.java:237) ~[graylog.jar:?]
... 51 more
So it seems like one Graylog node is trying to reach the other and this results in a timeout.
When it does work (from time to time) the response usually comes after ~9 seconds.
Now to my questions:
- Is there a way to increase the timeout? (quick fix to allow us to browse the available metrics more easily)
- Any common pitfalls that might explain the slow response? (when it works the resulting document is around ~34 kb in size so there are quite a lot of metrics)
Thanks!
Obligatory info:
2 GL nodes (same specs) + 1 Elastic node:
GL Memory: -Xms2g -Xmx2g
MongoDB running as replicaSet on both GL nodes
Graylog: Graylog 3.3.2+ade4779
Java: Oracle Corporation 1.8.0_252
Linux: Linux 3.10.0-1127.el7.x86_64