Hi, all.
During this week I noticed that my Elasticsearch server is running out of disk space. It’s configured to store 240 indices with up to 20 mln docs in each one. Having checked the size of indices on disk, I quickly noticed that one of them is significantly larger than all others. It contains 517,161,309 documents and keeps growing. It’s not active, but its size in GB keeps changing up and down between 160GB to 240GB. What do you think could be the cause and what can I do besides decreasing the maximum number of indices and waiting for it to stop growing?
Graylog 4.2.13 and ElasticSearch 7.10.1 running on Debian.
It might be creating new fields from the logs, more fields created could increase the amount of volume used.
EDIT: @Dimitri can I ask why you use message count then a Day count for a Rotation period? Documents can vary in size same as amount per day but having a Rotation period per day give a better understand on how many Indices Graylog server can handle.
Example here is my lab. Red boxes show the details I’m referring to. So if I’m limited on HDD space I can either shrink the amount of shards or indices to fit my needs
Thanks for the response. My idea was that by limiting the number of stored documents I could get also more or less stable disk space consumption, given that the average size of documents is approximately the same.
I understand that new fields, new types of documents and even changing share of a certain type of documents may lead to problems with disk space, but that’s not the case. It’s not the physical size of that index that has grown (but it has, too, of course), but the number of documents in one specific index. So, the problem is that ES breaks the limits set in the configuration file (unless I’m exceptionally stupid, which may also be true, from my experience).
Ok I understand now, I call that “Graylog’s Dark Magic”.
I see that index graylog_588 is 164 GB and the rest are 6-7 GB. I’m not sure, the only thing I can think of is that index with 164 GB received large doc’s during that time or Elasticsearch forgot to count correctly. Hence why I try to stay clear of counting documents and go by Date/time. Maybe someone here can shade some light on what’s going on with that.
By all means that is kind of strange, first time I’ve heard of something like that.
Hard to say why it happened. Any chance someone changed the settings on the index rotation and they were changed back after?
@gsmith is right, unless it somehow lost count, the only explanation is a batch of very large messages. Try using size or time as your rotation strategy and see if it happens again.