Elasticsearch always got corrupted after power shortage

mikgray · February 1, 2018, 9:59am

Hi,
if there is power shortage or anything bad happens to graylog box, there is 90% chance that you will spend hours figting corrupted elasticsearch shards.

I don’t care about previous data, I just need to be able to restore easily and put graylog back to work.

I got hint about graylog cleanse but this deletes all settings, content packs, etc.

How can I simply reset elasticsearch to get graylog start working again? I spent hours fighting shards and getting info on the Internet but it turned to be too difficult to manage.

What about using some different sollution than elasticsearch? I am aware mysql db would be slower but at least it wouldn’t crash so often.

Thanks

jochen · February 1, 2018, 10:02am

Maybe start by providing some details about the current state of your Elasticsearch cluster, what you want to achieve, and what you’ve done so far (and what the result of that was).

That’s currently not planned.

mikgray · February 1, 2018, 10:05am

I don’t think it is relevant but ok. Clusters are always “red” or “yellow”, no matter how many tons of text I read or commands I typed. I think it is so hard to fix because of possible data loss. But I don’t care about data loss, I just want to have it start working again. I admit I started to hate elasticsearch a week after graylog install and first power loss.

mikgray · February 1, 2018, 10:11am

Now, after graylog cleanse, I got yellow shards, stating “unassigned”. I remember the “battle to assign” from previous times without success. But graylog seems to be working so I think I would check it again and possibly give up.

jochen · February 1, 2018, 10:25am

What type of Graylog installation are you using?
What’s the configuration of Graylog and Elasticsearch?

mikgray · February 1, 2018, 11:17am

Latest graylog appliance with default configuration.

jochen · February 1, 2018, 1:32pm

The Elasticsearch cluster in the Graylog OVA is configured to use 1 replica shard per index shard by default.
This also means, that the cluster health state a single-node cluster (as it is the case in your environment) will always be YELLOW (because the replica shards cannot be assigned to another node).

See the following related GitHub issue:

mikgray · February 1, 2018, 1:54pm

Lol! Seems like one really needs to become expert in Elasticsearch to be able to use Graylog. I give up.
I think that due to some (needed?) tasks being available, simple tasks got too difficult. All those replicas, shards and clusters make me laugh.
But thanks for your answers.

karlt · February 8, 2018, 3:31pm

I’d expect indexes to get corrupted if the server lost power while they were in the process of being written to. Is this a production system? If so I think you should focus more on the power issues first.
Second thing to consider would be to add some redundancy by joining an additional couple of elasticsearch nodes and create a cluster.

If you just want to reset elasticsearch after a failure, stop the elasticsearch service, then goto the data.path directory that is listed in /etc/elasticsearch/elasticsearch.yml and delete everything under the indices directory. in my case, this is /mnt/nas/elasticsearch/graylog-production/nodes/0/indices/

The better way is to use curl to delete busted indexes. If elasticsearch is able to startup, first grab the health of each index:
wget -q ‘http://ELASTICSERVERIP:9200/_cat/indices?v’ -O- | sort

this will tell you whether each index is green, yellow, or red. green is good, yellow means it’s probably missing a replica, red means it’s busted.

run the following to delete
curl -XDELETE ELASTICSERVERIP:9200/NAMEOFINDEX

an even easier way is to pull up graylog and delete each index manually on the indices page

Here’s my method for manually recovering corrupted indexes
https://medium.com/@kyletomsik/recovering-corrupted-elasticsearch-indices-a86fede6b9c2

system · February 22, 2018, 3:31pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog restart always ends up with corrupted elasticsearch Graylog Central (peer support)	6	621	January 10, 2018
Even now still confused over relationship between RED/YELLOW and graylog Graylog Central (peer support)	2	552	September 30, 2017
Fresh Elasticsearch Graylog Central (peer support)	3	264	November 9, 2021
Elasticsearch cluster is red. Default Index set shard allocation issue Graylog Central (peer support) elastic	5	1397	July 6, 2023
Graylog2 start again or not? (Solved) Graylog Central (peer support)	9	1044	June 21, 2017

Elasticsearch always got corrupted after power shortage

Related topics