Indices in Graylog

Hello there,
Perhaps a simple question - When an index is deleted via normal rotation, what happens to the syslog’s data that index holds. We need to keep the syslog data for 7 years and I am not sure at this point that how much data would be going to that index. I can keep the index for 1 year at the moment but not sure what to do with the index after an year. If I select delete Index, I am worried log data would be deleted?

Any suggestion how do I setup indices?

Thanks

Hello,

If your Index Retention Configuration is set to delete then when your Max number of indices has been reached then those log are gone. BUT if you set your retention strategy to close then you will have those logs.

You can find more here.
https://docs.graylog.org/en/2.1/pages/index_model.html

I must say thats a long time.
NOTE: Enterprise version are able to archive logs.
https://docs.graylog.org/en/4.1/pages/archiving.html#archiving

hope that helps

1 Like

Oh man, y’all are gonna get me on a soapbox about index settings :grimacing:. What would prevent you from using archiving? I realize that’s an enterprise feature, but if you need to keep data around for 7 years, you’re going to want archiving @Raynu .

Let me also ask a different question: how far back are you asking questions about your data? Are you actually needing to search a whole 7 years? If you just need to retain the data, then seriously, archiving is going to be the solution to this problem. If you don’t need to search 7 year old data all day, every day, then get those indices off into some long-term cold storage so you can access them at a later date, should that occasion arise.

I’m going to walk through what’s going to happen under the hood if you’re not archiving.

Let’s say that you do a 1 day rotation and keep all of those indices open and not closed or deleted and let’s say, for :poop: and :laughing: that you take the default 4 shards in your index. At 365 days * 7 years, that’s a total of 2555 indices. With the 4 shard default, that’s 10220 shards, which would actually put you above Elasticsearch’s hard 10k shard limit…just for that index set. This would leave NO resources available for other indices to rotate out.

Putting that aside for a moment, let’s just assume that you do hit 10k shards. If you subscribe to the 20:1 shard:gb of cluster heap recommendations from Elasticsearch, then that would require you to have 500GB of heap across your entire cluster to effectively support all the index operations for that index set. Now, this doesn’t even take into consideration how much data lives in the index set.

If that index set happens to get a lot of data, then you could easily end up in a situation where you have to provision more shards than 4 to keep the shard size under the recommended 40GB size. If that ends up being the case, then you could end up hitting the 10k shards quicker than your planned 7 years, which again, will effectively cause your cluster to cease functioning, as your shards will continue to grow in size once the 10k limit is reached, which will slow down, and eventually stop your ability to search the data.

So, with that hypothetical scenario in mind, I’ll restate my questions:

  1. Do you actually need to keep 7 years of data around in Elasticsearch?
  2. What would prevent you from using enterprise so that you can archive that data and send it off into cold storage?
2 Likes

Thanks for recommendation. I have created indexes with 1 yr retention with an option to close them out instead of delete. Will observe in a couple of weeks to see the data growth pattern before I make further decisions.

1 Like

Thanks AAron,

I will check with the management that how we want to see the 7 years data and will get back to you.

You can however if you want to do the extra job to manually archive old data (it can be scripted i guess), i do so but i spend an hour to do it once in a month and for me it is worth it. But in 7 years much can happen so it might not be readable för some reson.

1 Like

I think, if possible, having a better idea of exactly what data is required to be retained would help. If you are not doing so already, you can leverage pipelines, streams and other indices to move just the required data around into an archive that you retain. might reduce the size overall.

1 Like

Thanks all, I will check with IT team for manual data archiving process. It is a new setup will see how data grows in few months time.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.