We are having this weird new behavior and are struggling to narrow down the issue.
Graylog version 2.4.6
2 Graylog Servers, 4 Datanodes
Graylog is relocating shards getting them all properly assigned only a little later to see that there are
n unassigned shards and the cluster is yellow again
Graylog starts initializing / relocating and self fixing the issue to eventual green and soon after the cycle repeats, we get
m unassigned shards.
Obviously this adds additional strain on the resources,
We’ve doubled the instance type of each node to make sure while this is happening, there’s enough resources available, yet we don’t seem to be able to get out of this vicious cycle.
And even though cerebro shows that the nodes have enough resources, Graylog is very unresponsive, very slow.
This is all new, never had this issue before.
Any ideas on where to start looking at or what might be the root cause would greatly help.