Our Graylog instance get not say high rate messages around 1000-2000 msg/s and journal growing till i get an alert in web interface that some uncommitted messages was deleted from journal and journal utilization is too high?
Can you advise which parameters should be tweaked for handle this load? Also is there a way to determine that Graylog under high load except that i found journal is growing without size decrease.
Also need to say that ES is from 3 nodes. 1 is only witness master and 2 others is data nodes. Before we have 1 node cluster. I am create master from AMI that 1 node as well as another node. And now is some indexes at relocation between 2 data nodes, is it can impact that Elastic search can’t handle messages as fast as we sending they.
But still think that it not cause because CPU loaded under 100, even 90% on one data node and under 50% on another.
Thank you.
Graylog Journal will raise if Elasticsearch is not able to handle the load or if the processing in Graylog took to much time so no real time processing is possible.
You should check your Logfiles of Graylog to find the reason for that.
2017-07-24T02:56:03.305-05:00 WARN [KafkaJournal] Journal utilization (96.0%) has gone over 95%.
2017-07-24T02:57:03.305-05:00 WARN [KafkaJournal] Journal utilization (96.0%) has gone over 95%.
2017-07-24T02:58:03.305-05:00 WARN [KafkaJournal] Journal utilization (97.0%) has gone over 95%.
2017-07-24T03:45:43.508-05:00 ERROR [ServerRuntime$Responder] An I/O error has occurred while writing a response message entity to the container output stream.
org.glassfish.jersey.server.internal.process.MappableException: java.io.IOException: Connection closed
at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:92) ~[graylog.jar:?]
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:162) ~[graylog.jar:?]
at org.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1130) ~[graylog.jar:?]
at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:711) [graylog.jar:?]
at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:444) [graylog.jar:?]
at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:434) [graylog.jar:?]
at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:329) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors.process(Errors.java:315) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors.process(Errors.java:297) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors.process(Errors.java:267) [graylog.jar:?]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) [graylog.jar:?]
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) [graylog.jar:?]
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) [graylog.jar:?]
at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:384) [graylog.jar:?]
at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:224) [graylog.jar:?]
at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) [graylog.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.io.IOException: Connection closed
This in logs Graylog. What container output stream actually mean? I see logs monitoring for I/O system fro Graylog node and it not under load, but Elasticsearch node does. Is this mean that ES node no have enough I/O throughput?
I didn’t find in Graylog documentation where this values explained. Also dont find how actually Graylog write messages into Elasticearch, even in Graylog deep dive slideshow. My guess path is Graylog -> journal -> es nodes? Am i right?
The another problem raised. When i have logged in into web interface all work is fine and so fast enough. But when i try to login from private session web browser or another user try to login web interface page loading too long time, 1 or 2 minutes.
Do you have an idea why it is that?
At time error i see an error in graylog log:
An I/O error has occurred while writing a response message entity to the container output stream.
We are facing the same issue in our company. When the Journal is growing and growing, our current solution for that is brute force. We add more cores and more RAM to the GL nodes, and then we change the following parameters to match the available CPU cores in the server. This workaround has worked for us.
processbuffer_processors = N cores
outputbuffer_processors = N cores