Graylog journal getting full

whats the meaning of these params?

Read the Documentation and you will see:

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/docs-index_.html#index-creation

Today i found i scenario where buffer start getting filled. there is mix of 2-3 search, currently my retention time is 20 days. so if i search for 7 days data and then i tried to show quick values for any field then suddenly i can see my output process buffer start getting filled and then i got some error message on graylog screen. i tried to grab some logs from ES which is here.

[2019-09-05T11:24:10,792][WARN ][o.e.m.j.JvmGcMonitorService] [node01] [gc][118] overhead, spent [2.4s] collecting in the last [2.5s]
[2019-09-05T11:24:11,800][WARN ][o.e.m.j.JvmGcMonitorService] [node01] [gc][119] overhead, spent [997ms] collecting in the last [1s]
[2019-09-05T11:24:22,447][WARN ][o.e.m.j.JvmGcMonitorService] [node01] [gc][120] overhead, spent [8.4s] collecting in the last [10.6s]
[2019-09-05T11:24:24,220][ERROR][o.e.ExceptionsHelper     ] [node01] fatal error
        at org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$2(ExceptionsHelper.java:264)
    at java.util.Optional.ifPresent(Optional.java:159)
    at org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:254)
    at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:74)
    at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
    at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:426)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
    at java.lang.Thread.run(Thread.java:745)
[2019-09-05T11:24:26,284][ERROR][o.e.ExceptionsHelper     ] [node01] fatal error
    at org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$2(ExceptionsHelper.java:264)
    at java.util.Optional.ifPresent(Optional.java:159)
    at org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:254)
    at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:74)
    at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
    at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:426)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
    at java.lang.Thread.run(Thread.java:745)

this is my elasticsearch performance inside the container while this query was executing

and after some time it back to normal.

Graylog heap size

GRAYLOG_SERVER_1_GL_HEAP="-Xms2g -Xmx4g"
GRAYLOG_SERVER_2_GL_HEAP="-Xms2g -Xmx4g"
GRAYLOG_SERVER_3_GL_HEAP="-Xms2g -Xmx4g"

Elasticsearch heap size

GRAYLOG_SERVER_1_ES_HEAP=“16g”
GRAYLOG_SERVER_2_ES_HEAP=“16g”
GRAYLOG_SERVER_3_ES_HEAP=“16g”

any idea ?

add more CPU and RAM to Elasticsearch - maybe give storage with better I/O

From what you describe it sounds like that host is at the limits.

seperating GL and ES would be a wise next step. IMO

Thanks for the suggestion but I have self-hosted environment so it will take some time to add more ram and core and today when checked last 10 days graylog behavior on grafana, i found that 70-80 % graylog output buffer get full when time is 00.00.00 to 00.30.00 at midnight, i think its the time when index rotate.
My retention policy

Rotation period: P1D (1d, a day)
Index retention strategy:Delete
Max number of indices: 20

can we do something about it or architecture and the resource update will be enough for this?

you could disable the force merge after index rotation …

Before applying any infra update, i just wanted to confirm is really related to infra side, because cost wise adding ram is not a big challenge but adding more Core is quite expensive. this is my three host performance from last 7 days

FIRST SERVER

image

SECOND SERVER

image

third server

image

you can see low CPU utilization because I have restarted my containers on 4th Sep 

and I got confirmation from infra team current disk type is SAN and we don’t have other option to change it.

Any Suggestion for the above query?