Data tiering - disk usage question

rodney_vdw · November 25, 2024, 8:13am

I have a question about “data tiering” in the graylog open version.

we run graylog 6.1.2 with opensearch 2.16.x (we are not yet using data node feature)

we have a mix of “data tiering” and “legacy” retention for our indices. I mostly switched to the new option cause of the “legacy” marked as deprecated.

My assumption with data tiering is that graylog would automatically trim indices to keep below the low watermark however we constantly see messages like this “Elasticsearch nodes disk usage above low watermark”, I then manually need to go and remove indices to bring the levels down so that shards can be allocated.

How does this mechanism decide when to remove older indices, especially in the case where we have multiple indices with different “min” retention times
is it possible to manually set thresholds for data tiering - example, try keep at least 15% of the disk space available etc…

As is stands I am considering switching back to “legacy” retention as i do not see the benefit (with regards to retention) of the “data tiering” option.

Any thoughts, comments here?

Wine_Merchant · December 2, 2024, 1:28pm

Hello @rodney_vdw,

I would recommend reading over this post to get a clear idea on rotation strategy.

You are still required to size your Opensearch nodes appropriately for the amount of data you want to store. Look at your daily ingest in GB across all indices, then think how many days you you want to store logs for. Take 10GB for 30 days as an example and a small allowance for variance within that 10GB a day, the simplest equation would be 10301.3=390GB. Double that if you wanted a replica set of each index.

In the above example data-tiering isn’t active, it’s really only worth considering using snapshots in larger clusters with heavier ingesting or where you need to store the logs in live storage for a long period of time.

rodney_vdw · December 2, 2024, 4:11pm

many thanks for the reply

system · December 16, 2024, 4:12pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog 6.0 - Data tiering for rotation and retention Graylog Central (peer support)	2	512	September 6, 2024
Log Retention Strategy Graylog Central (peer support)	5	2092	March 5, 2020
Data Retention since Graylog 6.0 Documentation Campfire	11	628	October 7, 2024
Elasticsearch nodes disk usage above high watermark Graylog Central (peer support)	6	3379	August 10, 2021
Elasticsearch nodes disk usage above low watermark Graylog Central (peer support) data-node	5	289	November 26, 2024

Data tiering - disk usage question

Related topics