Data tiering - disk usage question

I have a question about “data tiering” in the graylog open version.

we run graylog 6.1.2 with opensearch 2.16.x (we are not yet using data node feature)

we have a mix of “data tiering” and “legacy” retention for our indices. I mostly switched to the new option cause of the “legacy” marked as deprecated.

My assumption with data tiering is that graylog would automatically trim indices to keep below the low watermark however we constantly see messages like this “Elasticsearch nodes disk usage above low watermark”, I then manually need to go and remove indices to bring the levels down so that shards can be allocated.

  1. How does this mechanism decide when to remove older indices, especially in the case where we have multiple indices with different “min” retention times
  2. is it possible to manually set thresholds for data tiering - example, try keep at least 15% of the disk space available etc…

As is stands I am considering switching back to “legacy” retention as i do not see the benefit (with regards to retention) of the “data tiering” option.

Any thoughts, comments here?

Hello @rodney_vdw,

I would recommend reading over this post to get a clear idea on rotation strategy.

You are still required to size your Opensearch nodes appropriately for the amount of data you want to store. Look at your daily ingest in GB across all indices, then think how many days you you want to store logs for. Take 10GB for 30 days as an example and a small allowance for variance within that 10GB a day, the simplest equation would be 10301.3=390GB. Double that if you wanted a replica set of each index.

In the above example data-tiering isn’t active, it’s really only worth considering using snapshots in larger clusters with heavier ingesting or where you need to store the logs in live storage for a long period of time.

many thanks for the reply

1 Like