Iptimizing indexs and shards in Graylog

Hi all,

In the documentation @ configuring Graylog there could be missing some guiding rule of thumbs for using elasticsearch / opensearch in a more optimal configuration. The default setup is somehow conservative and or oversharded when using one elasticsearch node, or there are considerations at Graylog out of my knowledge.

This data and settings are gathered after doing some testing and looking at advisories from elasticsearch and results are promising. On a single node the number of shards was reduced from around 140 shards to 40 shards and having fine search results. Our shards with logging data are now 15GB on average containing 70 miljoen documents on average.

This could be documented at:

Elasticsearch - Configuring Graylog or Index model - Configuring Graylog

Guidelines for using indexes / shards more optimal.

  • One primary shard per elasticsearch node and for faster searching one replica shard when having more nodes (more shards considering having a primary and one replica do not immediate give better results)

  • Shard size can be 15 to 25 GB if search speed is what one desires, and up to 50GB if logging is what is important to you, it also depends on how many information a document contains.

  • When shards are getting to big then increase the number of primary shards by doubling it with the number of elasticsearch nodes.

  • Optimum for primary shards on a node is 20 shards per GB of heap space.

  • Disk size needed for data is 1GB of heap per 16GB of disk space but can be larger imho.

  • more shards = faster writes, less shards = faster reads (from elastic)

Hope this helps some people out with questions that might arise, and as they state at Elastic:
"It Depends’ :slight_smile:



Thank you, @Arie for your suggestions! I checked this with our documentation team and the manager was very appreciative of your post and wants you to know that “it’s part of a conversation we’re currently having about updating some core documentation to include more use case scenarios. We will consider this as we revise the documentation.”