Graylog journal getting full

whats the meaning of these params?

Read the Documentation and you will see:

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/docs-index_.html#index-creation

1 Like

Today i found i scenario where buffer start getting filled. there is mix of 2-3 search, currently my retention time is 20 days. so if i search for 7 days data and then i tried to show quick values for any field then suddenly i can see my output process buffer start getting filled and then i got some error message on graylog screen. i tried to grab some logs from ES which is here.

[2019-09-05T11:24:10,792][WARN ][o.e.m.j.JvmGcMonitorService] [node01] [gc][118] overhead, spent [2.4s] collecting in the last [2.5s]
[2019-09-05T11:24:11,800][WARN ][o.e.m.j.JvmGcMonitorService] [node01] [gc][119] overhead, spent [997ms] collecting in the last [1s]
[2019-09-05T11:24:22,447][WARN ][o.e.m.j.JvmGcMonitorService] [node01] [gc][120] overhead, spent [8.4s] collecting in the last [10.6s]
[2019-09-05T11:24:24,220][ERROR][o.e.ExceptionsHelper     ] [node01] fatal error
        at org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$2(ExceptionsHelper.java:264)
    at java.util.Optional.ifPresent(Optional.java:159)
    at org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:254)
    at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:74)
    at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
    at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:426)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
    at java.lang.Thread.run(Thread.java:745)
[2019-09-05T11:24:26,284][ERROR][o.e.ExceptionsHelper     ] [node01] fatal error
    at org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$2(ExceptionsHelper.java:264)
    at java.util.Optional.ifPresent(Optional.java:159)
    at org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:254)
    at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:74)
    at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
    at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:426)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
    at java.lang.Thread.run(Thread.java:745)

this is my elasticsearch performance inside the container while this query was executing

and after some time it back to normal.

Graylog heap size

GRAYLOG_SERVER_1_GL_HEAP="-Xms2g -Xmx4g"
GRAYLOG_SERVER_2_GL_HEAP="-Xms2g -Xmx4g"
GRAYLOG_SERVER_3_GL_HEAP="-Xms2g -Xmx4g"

Elasticsearch heap size

GRAYLOG_SERVER_1_ES_HEAP=“16g”
GRAYLOG_SERVER_2_ES_HEAP=“16g”
GRAYLOG_SERVER_3_ES_HEAP=“16g”

any idea ?

add more CPU and RAM to Elasticsearch - maybe give storage with better I/O

From what you describe it sounds like that host is at the limits.

seperating GL and ES would be a wise next step. IMO

Thanks for the suggestion but I have self-hosted environment so it will take some time to add more ram and core and today when checked last 10 days graylog behavior on grafana, i found that 70-80 % graylog output buffer get full when time is 00.00.00 to 00.30.00 at midnight, i think its the time when index rotate.
My retention policy

Rotation period: P1D (1d, a day)
Index retention strategy:Delete
Max number of indices: 20

can we do something about it or architecture and the resource update will be enough for this?

you could disable the force merge after index rotation …

Before applying any infra update, i just wanted to confirm is really related to infra side, because cost wise adding ram is not a big challenge but adding more Core is quite expensive. this is my three host performance from last 7 days

FIRST SERVER

image

SECOND SERVER

image

third server

image

you can see low CPU utilization because I have restarted my containers on 4th Sep 

and I got confirmation from infra team current disk type is SAN and we don’t have other option to change it.

Any Suggestion for the above query?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.