Hi
We are having this weird new behavior and are struggling to narrow down the issue.
Graylog version 2.4.6
2 Graylog Servers, 4 Datanodes
Graylog is relocating shards getting them all properly assigned only a little later to see that there are n unassigned shards and the cluster is yellow again
Graylog starts initializing / relocating and self fixing the issue to eventual green and soon after the cycle repeats, we get m unassigned shards.
Obviously this adds additional strain on the resources,
We’ve doubled the instance type of each node to make sure while this is happening, there’s enough resources available, yet we don’t seem to be able to get out of this vicious cycle.
And even though cerebro shows that the nodes have enough resources, Graylog is very unresponsive, very slow.
This is all new, never had this issue before.
Any ideas on where to start looking at or what might be the root cause would greatly help.
read the elastic’s log first.
If the cluster goes to yellow, it means some shards missing(some replicas), so I think you loose it, and elastic starts to initialize the new ones and not relocating it.
Or if you loose a datanode, and it comes back as empty after the elastic goes to green (it has all shards), it can relocate the data to the empty datanode.
Thank you both @macko003 and @jan for the follow up.
I believe @jan hit the nail on the head. We had too many shards.
Those documents are awesome, many thanks.