Runaway Index and allocation failure

tomjcollins85 · April 9, 2018, 8:12am

Well this is weird. GL has been in our organisation for 18 months (kept up-to-date), and we’ve never had any problems of note.

Today however, I came in to the office after the weekend, and my cluster-state was red. I had a quick look at my indexes, and noticed that the current active index had not cycled. My max index size is 35GB, and this one had managed to rock up to 217GB.

Is there anywhere in particular I should be looking for the route cause of this? My ES cluster was at 90% capacity at the time (so around 400GB left on each of the three ES nodes).

I tried to manually rotate the active write index, but that just seemed to create more unassigned shards. I then deleted around 5 indexes from the bottom of the pile, and they started re-assigning. Weird, as to me it looked like there was plenty of space left on the ES cluster, but more importantly, what would cause it to not rotate in the first place? Is there some sort of limit imposed on the amount of ES storage that can be used possibly?

Images attached.

Many Thanks,
Tom

gl02 gl01

jochen · April 9, 2018, 8:30am

The cluster health state of your Elasticsearch cluster is RED (also see http://docs.graylog.org/en/2.4/pages/configuration/elasticsearch.html#cluster-status-explained).

Check the logs of your Elasticsearch node(s) and make sure that the cluster state is YELLOW or GREEN (recommended). After that, Graylog should be able to rotate the indices again.

jan · April 9, 2018, 9:43am

is your retention strategy time based? If yes - something that send the logfiles to Graylog was set to debug or some host had serious problems and yelled that into the logfiles.

dustintennill · April 9, 2018, 9:52pm

ES tends to begin to complain as filesystems where your ES data is stored begins exceeds the settings for low/high disk space.

By default, as a node’s filesystem used for ES reaches higher than 90%, ES will stop allocating data to that node. You can continue to read data from that node, but no more shards will be created. This number can be raised by modifying your Elasticsearch config files.

I am guessing that this is your issue - can’t know for sure without seeing log messages.

https://www.elastic.co/guide/en/elasticsearch/reference/current/disk-allocator.html

Ways to get out of the issue:

Add another node to your cluster so there is more space available. Not a perfectly simple operation, but depending on how your indexes are configured it can automatically move things around.
Adjust the watermark settings so your ES node can use a little more space (instead of 90% maybe 93%).
Delete old indexes to get you back under the mark.

Anyway - logs from your ES nodes(s) will provide more information about the issue.

Dustin

tomjcollins85 · April 10, 2018, 12:28pm

Yep, bang on. Than you Dustin.
My low watermark was 90% and I’d recently increased my retention policy slightly.

That was enough to push it just over.

Now corrected and all clear.

Appreciate your help.

Many Thanks,
Tom

dustintennill · April 10, 2018, 8:57pm

Glad to hear it worked out!!

Dustin

system · April 24, 2018, 9:09pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Confused on why I'm not storing as much as I expected Graylog Central (peer support)	3	503	September 5, 2017
ElasticSearch Data (un-)Balance Graylog Central (peer support)	5	817	February 9, 2020
Even now still confused over relationship between RED/YELLOW and graylog Graylog Central (peer support)	2	552	September 30, 2017
Elasticsearch cluster is red. Default Index set shard allocation issue Graylog Central (peer support) elastic	5	1401	July 6, 2023
Elasticsearch data nodes got full on disk space Graylog Central (peer support)	15	9324	December 7, 2018

Runaway Index and allocation failure

Related topics