Hi there
Recently I’ve asked several questions about having ES failures and then finding graylog not being happy, and got advise that graylog will continue to push data into ES if it is YELLOW - while it is fixing up unassigned shards
Well yesterday our 4-node ES cluster started reporting OOM errors (set to 16G) and everything died. So I did some google-ing, and as the systems are all 64G, I’ve increased ES_HEAP_SIZE to 24g and restarted the cluster. When it came back up, there were 3500 unassigned shards - so it was state YELLOW. Well graylog is refusing to push data in. The journal is full and I’m losing messages. (ie there are 2000 msg/sec incoming and 0/sec outgoing)
ES is assigning those shards really slowly. It’s been 16 hours and it’s only processed 1000 - there’s still 2400 to go. (btw there are 8 “initializing_shards” too if it matters)
graylog is working - I can do searches - but all I can see is old data - looks like I’m losing all current data
How can I fix this? I thought graylog could push data in under ES state YELLOW? This is GL-2.3.1 and ES-2.4.6
Thanks