Graylog journal continuously growing up

mpolitaev · July 23, 2017, 7:32pm

Our Graylog instance get not say high rate messages around 1000-2000 msg/s and journal growing till i get an alert in web interface that some uncommitted messages was deleted from journal and journal utilization is too high?

Can you advise which parameters should be tweaked for handle this load? Also is there a way to determine that Graylog under high load except that i found journal is growing without size decrease.

Also need to say that ES is from 3 nodes. 1 is only witness master and 2 others is data nodes. Before we have 1 node cluster. I am create master from AMI that 1 node as well as another node. And now is some indexes at relocation between 2 data nodes, is it can impact that Elastic search can’t handle messages as fast as we sending they.

But still think that it not cause because CPU loaded under 100, even 90% on one data node and under 50% on another.
Thank you.

jan · July 24, 2017, 9:33am

Graylog Journal will raise if Elasticsearch is not able to handle the load or if the processing in Graylog took to much time so no real time processing is possible.

You should check your Logfiles of Graylog to find the reason for that.

mpolitaev · July 24, 2017, 10:14am

2017-07-24T02:56:03.305-05:00 WARN  [KafkaJournal] Journal utilization (96.0%) has gone over 95%.
2017-07-24T02:57:03.305-05:00 WARN  [KafkaJournal] Journal utilization (96.0%) has gone over 95%.
2017-07-24T02:58:03.305-05:00 WARN  [KafkaJournal] Journal utilization (97.0%) has gone over 95%.
2017-07-24T03:45:43.508-05:00 ERROR [ServerRuntime$Responder] An I/O error has occurred while writing a response message entity to the container output stream.
org.glassfish.jersey.server.internal.process.MappableException: java.io.IOException: Connection closed
        at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:92) ~[graylog.jar:?]
        at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:162) ~[graylog.jar:?]
        at org.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1130) ~[graylog.jar:?]
        at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:711) [graylog.jar:?]
        at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:444) [graylog.jar:?]
        at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:434) [graylog.jar:?]
        at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:329) [graylog.jar:?]
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) [graylog.jar:?]
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) [graylog.jar:?]
        at org.glassfish.jersey.internal.Errors.process(Errors.java:315) [graylog.jar:?]
        at org.glassfish.jersey.internal.Errors.process(Errors.java:297) [graylog.jar:?]
        at org.glassfish.jersey.internal.Errors.process(Errors.java:267) [graylog.jar:?]
        at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) [graylog.jar:?]
        at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) [graylog.jar:?]
        at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) [graylog.jar:?]
        at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:384) [graylog.jar:?]
        at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:224) [graylog.jar:?]
        at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) [graylog.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.io.IOException: Connection closed

This in logs Graylog. What container output stream actually mean? I see logs monitoring for I/O system fro Graylog node and it not under load, but Elasticsearch node does. Is this mean that ES node no have enough I/O throughput?

Also I see parameters:
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 5
outputbuffer_processors = 3
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true

I didn’t find in Graylog documentation where this values explained. Also dont find how actually Graylog write messages into Elasticearch, even in Graylog deep dive slideshow. My guess path is Graylog -> journal -> es nodes? Am i right?

Thank you.

jochen · July 25, 2017, 12:06pm

github.com

Graylog2/graylog2-server/blob/2.2.3/misc/graylog.conf

############################
# GRAYLOG CONFIGURATION FILE
############################
#
# This is the Graylog configuration file. The file has to use ISO 8859-1/Latin-1 character encoding.
# Characters that cannot be directly represented in this encoding can be written using Unicode escapes
# as defined in https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.3, using the \u prefix.
# For example, \u002c.
# 
# * Entries are generally expected to be a single line of the form, one of the following:
#
# propertyName=propertyValue
# propertyName:propertyValue
#
# * White space that appears between the property name and property value is ignored,
#   so the following are equivalent:
# 
# name=Stephen
# name = Stephen
#

This file has been truncated. show original

mpolitaev · July 25, 2017, 12:51pm

Thank you. Now is clear.

The another problem raised. When i have logged in into web interface all work is fine and so fast enough. But when i try to login from private session web browser or another user try to login web interface page loading too long time, 1 or 2 minutes.
Do you have an idea why it is that?

At time error i see an error in graylog log:

An I/O error has occurred while writing a response message entity to the container output stream.

scampuza · July 25, 2017, 3:58pm

We are facing the same issue in our company. When the Journal is growing and growing, our current solution for that is brute force. We add more cores and more RAM to the GL nodes, and then we change the following parameters to match the available CPU cores in the server. This workaround has worked for us.

processbuffer_processors = N cores
outputbuffer_processors = N cores

mpolitaev · July 25, 2017, 4:17pm

Strange because i don’t see by CPU load average log that CPU is under load, even more 50%.

jtkarvo · July 26, 2017, 2:41pm

If Graylog server CPU is low, but messages are not processed quickly, it is possible that the Elasticsearch cluster is not quick enough.

You can try to speed things up by:

setting output_batch_size to a larger value (for example 5000)
adding RAM to the Elasticsearch servers, and setting Elasticsearch JVM size to half of the new amount of RAM of these servers
if you have such high CPU utilizations (about 50%) on ES nodes, you probably have too little memory on those nodes.

system · August 9, 2017, 3:06pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Journal keeps growing Graylog Central (peer support)	8	2026	March 21, 2018
Journal utilization is too high - elasticsearch running Graylog Central (peer support)	7	3121	October 23, 2018
Journal Message processing Graylog Central (peer support)	2	944	June 24, 2017
Journal is filling up along the time Graylog Central (peer support)	2	863	March 30, 2017
Journal utilization is too high again Graylog Central (peer support)	7	6079	June 15, 2018

Graylog journal continuously growing up

Related topics