Shards configuration with a single server

Hello all :slight_smile:

I’m running a 4.3.9 graylog single-server env on rocky linux 8. So no clustering at all.

I have 7 differents index sets, and each have between 12 and 180 indices, rotation periods vary also between 2 hours and 1 day.

I had a few issues in the last week where the largest index set could not rotate anymore, and thus the active one was growing and growing.
Checking the logs, I observed that the rotation could not be done because the server was running short on shards [1000/1000]. I was able to solve this issue by manually deleting the older indices from the GUI.

But as I read a bit more about shards, it seems that I only need 1 shard per index, is this correct ?

I could then simply reduce the number of shards from 4 (default) to 1 on each index, and let the rotation do its thing little by little.

If you’re a graylog / ES wizard, wink twice if you approve.

Thanks in advance !

Yes, you nailed it. Per shard you should have approx 20-30GB of data. If you to many shards, reduce them. It will “grow” out of you index. If you still have to little amount of logs go for a longer rotation period.
You also could install a second server for another 1000 shards as a cluster.

1 Like

But even with a single server deployment, it is then still beneficial to have several shards for an index set ?

Imagine two index sets :

  • Index set A :
    • rotation period : 1 day
    • max number of indices : 30
    • total used disk space : 10GB
  • Index set B :
    • rotation period :2 hours
    • max number of indices of 48
    • total used disk space : 200 GB

What would be the correct sharding strategy assigned to each index set for a single server deployment ?

Good news - we are working on a new rotation and retention strategy that will make it easier to keep shards within reasonable size limits; and avoid excessive number of shards.
Stay tuned …

3 Likes

yes!
You create new shards every time your index rotates. If you rotate every day you will create new shards every day. But you will also delete the oldest ones → the number of shards will stay constant.
My default setting is to set the rotation on P1D → one day. If I ingest more than 30GB of data into that index per day, I will set the number of shards to two, if I have more than 50 to three and so on. The daily amount is the relevant amount, as I rotate every day.

Total used disk space is not so relevant here, as it depends how many indices I will keep. If I want to have 90 days of data it will be more than just for 10 days.

Your index set A with 10GB in total in 30 days is perfectly fine with only one shard per day.
Your index set B with 200GB in four days is a bit strange. I’d go for a daily rotation with two shards.

Two important things for the performance:

  1. per 20 shards you should allocate 1GB of Heap for your Elastic/Opensearch
  2. for each GB heap of Elastic/Opensearch you should have one GB of RAM for buffers by the OS.

Example:
16GB Ram for your Elastic/Opensearch-Machine, no Graylog or Mongo on that machine.
8GB Heap for Elastic/Opensearch, 8 for OS and Caches
8GB Heap * 20 shards = 160 shards

1 Like

Thanks @ihe for the shards / rotation ratio, it’s really useful to know.

About the performance, what would you recommend when all graylog services are on the same machine (graylog, ES, Mongo) ?

One Machine: 16 Cores, 32 GB RAM:
10GB Graylog Heap
10 GB Opensearch Heap
~1GB Mongo, max. The Database should be tiny, as there is no logdata, only config, in it.
Rest for System and Caching.
Make sure to assign enough processing threads in the Graylog Config.
If you want to Cluster your Opensearch go for a second node here first.

1 Like