Elasticsearch always got corrupted after power shortage

Hi,
if there is power shortage or anything bad happens to graylog box, there is 90% chance that you will spend hours figting corrupted elasticsearch shards.

I don’t care about previous data, I just need to be able to restore easily and put graylog back to work.

I got hint about graylog cleanse but this deletes all settings, content packs, etc.

How can I simply reset elasticsearch to get graylog start working again? I spent hours fighting shards and getting info on the Internet but it turned to be too difficult to manage.

What about using some different sollution than elasticsearch? I am aware mysql db would be slower but at least it wouldn’t crash so often.

Thanks

Maybe start by providing some details about the current state of your Elasticsearch cluster, what you want to achieve, and what you’ve done so far (and what the result of that was).

That’s currently not planned.

I don’t think it is relevant but ok. Clusters are always “red” or “yellow”, no matter how many tons of text I read or commands I typed. I think it is so hard to fix because of possible data loss. But I don’t care about data loss, I just want to have it start working again. I admit I started to hate elasticsearch a week after graylog install and first power loss.

Now, after graylog cleanse, I got yellow shards, stating “unassigned”. I remember the “battle to assign” from previous times without success. But graylog seems to be working so I think I would check it again and possibly give up.

What type of Graylog installation are you using?
What’s the configuration of Graylog and Elasticsearch?

Latest graylog appliance with default configuration.

The Elasticsearch cluster in the Graylog OVA is configured to use 1 replica shard per index shard by default.
This also means, that the cluster health state a single-node cluster (as it is the case in your environment) will always be YELLOW (because the replica shards cannot be assigned to another node).

See the following related GitHub issue:

Lol! Seems like one really needs to become expert in Elasticsearch to be able to use Graylog. I give up.
I think that due to some (needed?) tasks being available, simple tasks got too difficult. All those replicas, shards and clusters make me laugh.
But thanks for your answers.

I’d expect indexes to get corrupted if the server lost power while they were in the process of being written to. Is this a production system? If so I think you should focus more on the power issues first.
Second thing to consider would be to add some redundancy by joining an additional couple of elasticsearch nodes and create a cluster.

If you just want to reset elasticsearch after a failure, stop the elasticsearch service, then goto the data.path directory that is listed in /etc/elasticsearch/elasticsearch.yml and delete everything under the indices directory. in my case, this is /mnt/nas/elasticsearch/graylog-production/nodes/0/indices/

The better way is to use curl to delete busted indexes. If elasticsearch is able to startup, first grab the health of each index:
wget -q ‘http://ELASTICSERVERIP:9200/_cat/indices?v’ -O- | sort

this will tell you whether each index is green, yellow, or red. green is good, yellow means it’s probably missing a replica, red means it’s busted.

run the following to delete
curl -XDELETE ELASTICSERVERIP:9200/NAMEOFINDEX

an even easier way is to pull up graylog and delete each index manually on the indices page

Here’s my method for manually recovering corrupted indexes
https://medium.com/@kyletomsik/recovering-corrupted-elasticsearch-indices-a86fede6b9c2

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.