Indexer failures on restarts

Richard · February 12, 2020, 2:10pm

We are running Graylog in a kubernetes cluster. The VMs automatically get OS updates, and we use kured to automatically drain the nodes before rebooting them.
When this happened last night Graylog reported ~ 1000 indexing failures, with the message
{“type”:“unavailable_shards_exception”,“reason”:"[graylog_10][2] primary shard is not active Timeout: [1m],

We have 2 graylog pods. Only 1 rebooted last night.
We have 3 elasticsearch-master-n pods. Only 1 of those rebooted last night.

Does an indexing failure message indicate that the log has been lost? Or is it buffered and retried?
If lost, what do I need to look into to prevent this?

Looking at the app logs using kubectl has the downside that after reboots you lose your history. Graylog is much better in this regard. But losing logs during reboots reduces this value.

jan · February 13, 2020, 6:49am

the message indicate, that you have lost messages.

Because the Elasticsearch Nodes that holds the primary shard of the index got a reboot and was not present to write (I guess). What is your index setting? How many shards and Replicas do you have configured?

Richard · February 13, 2020, 2:53pm

This is our dev system we are using to learn about Graylog. I didn’t do much thinking about the ES setup.
5 indices
4 shards
0 replicas

Is the 0 replicas the issue?
If this is an ES problem, I will get some help from the ES expert on the team.

jan · February 13, 2020, 7:53pm

if you have no replicas and 4 shards, but 3 graylog servers every reboot of one ES server will make the index not available.

Yes that is an elasticsearch problem.

Richard · February 24, 2020, 7:07pm

Just to follow up, after setting up replicas, I’ve not had any more indexing failures.
Thanks @jan

system · March 9, 2020, 7:07pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog cluster indexer failures after reboot graylog master Graylog Central (peer support)	12	1176	February 21, 2019
Graylog failing to index data Graylog Central (peer support)	13	1425	May 25, 2021
Help with shards Graylog Central (peer support)	10	4210	February 15, 2019
Graylog does not search! Graylog Central (peer support)	15	3537	July 4, 2019
Graylog not processing messages after crash (ran out of space) Graylog Central (peer support)	5	3674	April 20, 2020

Indexer failures on restarts

Related topics