whats the meaning of these params?
Read the Documentation and you will see:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/docs-index_.html#index-creation
Today i found i scenario where buffer start getting filled. there is mix of 2-3 search, currently my retention time is 20 days. so if i search for 7 days data and then i tried to show quick values for any field then suddenly i can see my output process buffer start getting filled and then i got some error message on graylog screen. i tried to grab some logs from ES which is here.
[2019-09-05T11:24:10,792][WARN ][o.e.m.j.JvmGcMonitorService] [node01] [gc][118] overhead, spent [2.4s] collecting in the last [2.5s]
[2019-09-05T11:24:11,800][WARN ][o.e.m.j.JvmGcMonitorService] [node01] [gc][119] overhead, spent [997ms] collecting in the last [1s]
[2019-09-05T11:24:22,447][WARN ][o.e.m.j.JvmGcMonitorService] [node01] [gc][120] overhead, spent [8.4s] collecting in the last [10.6s]
[2019-09-05T11:24:24,220][ERROR][o.e.ExceptionsHelper ] [node01] fatal error
at org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$2(ExceptionsHelper.java:264)
at java.util.Optional.ifPresent(Optional.java:159)
at org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:254)
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:74)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:426)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at java.lang.Thread.run(Thread.java:745)
[2019-09-05T11:24:26,284][ERROR][o.e.ExceptionsHelper ] [node01] fatal error
at org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$2(ExceptionsHelper.java:264)
at java.util.Optional.ifPresent(Optional.java:159)
at org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:254)
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:74)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:426)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at java.lang.Thread.run(Thread.java:745)
this is my elasticsearch performance inside the container while this query was executing
and after some time it back to normal.
Graylog heap size
GRAYLOG_SERVER_1_GL_HEAP="-Xms2g -Xmx4g"
GRAYLOG_SERVER_2_GL_HEAP="-Xms2g -Xmx4g"
GRAYLOG_SERVER_3_GL_HEAP="-Xms2g -Xmx4g"
Elasticsearch heap size
GRAYLOG_SERVER_1_ES_HEAP=“16g”
GRAYLOG_SERVER_2_ES_HEAP=“16g”
GRAYLOG_SERVER_3_ES_HEAP=“16g”
any idea ?
add more CPU and RAM to Elasticsearch - maybe give storage with better I/O
From what you describe it sounds like that host is at the limits.
seperating GL and ES would be a wise next step. IMO
Thanks for the suggestion but I have self-hosted environment so it will take some time to add more ram and core and today when checked last 10 days graylog behavior on grafana, i found that 70-80 % graylog output buffer get full when time is 00.00.00 to 00.30.00 at midnight, i think its the time when index rotate.
My retention policy
Rotation period: P1D (1d, a day)
Index retention strategy:Delete
Max number of indices: 20
can we do something about it or architecture and the resource update will be enough for this?
you could disable the force merge after index rotation …
Before applying any infra update, i just wanted to confirm is really related to infra side, because cost wise adding ram is not a big challenge but adding more Core is quite expensive. this is my three host performance from last 7 days
FIRST SERVER

SECOND SERVER

third server

you can see low CPU utilization because I have restarted my containers on 4th Sep
and I got confirmation from infra team current disk type is SAN and we don’t have other option to change it.
Any Suggestion for the above query?
