Oh man, y’all are gonna get me on a soapbox about index settings . What would prevent you from using archiving? I realize that’s an enterprise feature, but if you need to keep data around for 7 years, you’re going to want archiving @Raynu .
Let me also ask a different question: how far back are you asking questions about your data? Are you actually needing to search a whole 7 years? If you just need to retain the data, then seriously, archiving is going to be the solution to this problem. If you don’t need to search 7 year old data all day, every day, then get those indices off into some long-term cold storage so you can access them at a later date, should that occasion arise.
I’m going to walk through what’s going to happen under the hood if you’re not archiving.
Let’s say that you do a 1 day rotation and keep all of those indices open and not closed or deleted and let’s say, for and that you take the default 4 shards in your index. At 365 days * 7 years, that’s a total of 2555 indices. With the 4 shard default, that’s 10220 shards, which would actually put you above Elasticsearch’s hard 10k shard limit…just for that index set. This would leave NO resources available for other indices to rotate out.
Putting that aside for a moment, let’s just assume that you do hit 10k shards. If you subscribe to the 20:1 shard:gb of cluster heap recommendations from Elasticsearch, then that would require you to have 500GB of heap across your entire cluster to effectively support all the index operations for that index set. Now, this doesn’t even take into consideration how much data lives in the index set.
If that index set happens to get a lot of data, then you could easily end up in a situation where you have to provision more shards than 4 to keep the shard size under the recommended 40GB size. If that ends up being the case, then you could end up hitting the 10k shards quicker than your planned 7 years, which again, will effectively cause your cluster to cease functioning, as your shards will continue to grow in size once the 10k limit is reached, which will slow down, and eventually stop your ability to search the data.
So, with that hypothetical scenario in mind, I’ll restate my questions:
- Do you actually need to keep 7 years of data around in Elasticsearch?
- What would prevent you from using enterprise so that you can archive that data and send it off into cold storage?