Even now still confused over relationship between RED/YELLOW and graylog

Hi there

Recently I’ve asked several questions about having ES failures and then finding graylog not being happy, and got advise that graylog will continue to push data into ES if it is YELLOW - while it is fixing up unassigned shards

Well yesterday our 4-node ES cluster started reporting OOM errors (set to 16G) and everything died. So I did some google-ing, and as the systems are all 64G, I’ve increased ES_HEAP_SIZE to 24g and restarted the cluster. When it came back up, there were 3500 unassigned shards - so it was state YELLOW. Well graylog is refusing to push data in. The journal is full and I’m losing messages. (ie there are 2000 msg/sec incoming and 0/sec outgoing)

ES is assigning those shards really slowly. It’s been 16 hours and it’s only processed 1000 - there’s still 2400 to go. (btw there are 8 “initializing_shards” too if it matters)

graylog is working - I can do searches - but all I can see is old data - looks like I’m losing all current data :frowning:

How can I fix this? I thought graylog could push data in under ES state YELLOW? This is GL-2.3.1 and ES-2.4.6

Thanks

Maybe the Graylog journal has been corrupted.

Try deleting (or moving away) the journal files while Graylog is stopped and start Graylog again.

If you have enough resources (i. e. network bandwidth and IOPS), you can increase the bandwidth Elasticsearch is using for shard recovery:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.