Graylog journal getting full

rajnishtyagi · September 5, 2019, 6:19am

whats the meaning of these params?

jan · September 5, 2019, 6:24am

Read the Documentation and you will see:

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/docs-index_.html#index-creation

rajnishtyagi · September 5, 2019, 11:37am

Today i found i scenario where buffer start getting filled. there is mix of 2-3 search, currently my retention time is 20 days. so if i search for 7 days data and then i tried to show quick values for any field then suddenly i can see my output process buffer start getting filled and then i got some error message on graylog screen. i tried to grab some logs from ES which is here.

[2019-09-05T11:24:10,792][WARN ][o.e.m.j.JvmGcMonitorService] [node01] [gc][118] overhead, spent [2.4s] collecting in the last [2.5s]
[2019-09-05T11:24:11,800][WARN ][o.e.m.j.JvmGcMonitorService] [node01] [gc][119] overhead, spent [997ms] collecting in the last [1s]
[2019-09-05T11:24:22,447][WARN ][o.e.m.j.JvmGcMonitorService] [node01] [gc][120] overhead, spent [8.4s] collecting in the last [10.6s]
[2019-09-05T11:24:24,220][ERROR][o.e.ExceptionsHelper     ] [node01] fatal error
        at org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$2(ExceptionsHelper.java:264)
    at java.util.Optional.ifPresent(Optional.java:159)
    at org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:254)
    at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:74)
    at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
    at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:426)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
    at java.lang.Thread.run(Thread.java:745)
[2019-09-05T11:24:26,284][ERROR][o.e.ExceptionsHelper     ] [node01] fatal error
    at org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$2(ExceptionsHelper.java:264)
    at java.util.Optional.ifPresent(Optional.java:159)
    at org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:254)
    at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:74)
    at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
    at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:426)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
    at java.lang.Thread.run(Thread.java:745)

this is my elasticsearch performance inside the container while this query was executing

and after some time it back to normal.

Graylog heap size

GRAYLOG_SERVER_1_GL_HEAP="-Xms2g -Xmx4g"
GRAYLOG_SERVER_2_GL_HEAP="-Xms2g -Xmx4g"
GRAYLOG_SERVER_3_GL_HEAP="-Xms2g -Xmx4g"

Elasticsearch heap size

GRAYLOG_SERVER_1_ES_HEAP=“16g”
GRAYLOG_SERVER_2_ES_HEAP=“16g”
GRAYLOG_SERVER_3_ES_HEAP=“16g”

any idea ?

jan · September 5, 2019, 12:18pm

add more CPU and RAM to Elasticsearch - maybe give storage with better I/O

From what you describe it sounds like that host is at the limits.

cawfehman · September 5, 2019, 2:07pm

seperating GL and ES would be a wise next step. IMO

rajnishtyagi · September 6, 2019, 5:09am

Thanks for the suggestion but I have self-hosted environment so it will take some time to add more ram and core and today when checked last 10 days graylog behavior on grafana, i found that 70-80 % graylog output buffer get full when time is 00.00.00 to 00.30.00 at midnight, i think its the time when index rotate.
My retention policy

Rotation period: P1D (1d, a day)
Index retention strategy:Delete
Max number of indices: 20

can we do something about it or architecture and the resource update will be enough for this?

jan · September 6, 2019, 7:10am

you could disable the force merge after index rotation …

rajnishtyagi · September 6, 2019, 9:34am

Before applying any infra update, i just wanted to confirm is really related to infra side, because cost wise adding ram is not a big challenge but adding more Core is quite expensive. this is my three host performance from last 7 days

FIRST SERVER

SECOND SERVER

third server

you can see low CPU utilization because I have restarted my containers on 4th Sep

and I got confirmation from infra team current disk type is SAN and we don’t have other option to change it.

rajnishtyagi · September 9, 2019, 6:30am

Any Suggestion for the above query?

system · September 23, 2019, 6:30am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog high journal utilization is too high Graylog Central (peer support)	43	12452	January 25, 2022
Journal utilization is too high - process buffer 100% Graylog Central (peer support) alert , elastic	20	6166	April 11, 2022
Graylog Cluster, Buffer process 100% stop process messages Graylog Central (peer support)	22	17085	November 28, 2018
Championing Graylog and need performance advice Graylog Central (peer support)	10	4147	September 14, 2017
Graylog, log problem Graylog Central (peer support)	23	2139	March 18, 2019

Graylog journal getting full

Graylog heap size

Elasticsearch heap size

Related topics