Still confused over ElasticSearch green/red meaning

Hi there

I asked this question on the ES list recently and still haven’t got an answer that I understand

Basically what I don’t get is why when I shutdown graylog-server, then shutdown elasticsearch - which is “green” according to /_cluster/health, when I restart elasticsearch, it always comes up “red” - with lots of unassigned shards. They get processed and eventually it goes “green” - but my point is why does a formal shutdown leave ES in a poor state? This really impacts availability in that if (say) I was doing an incremental ES upgrade, I’d expect (as is the case with all other software I’ve ever used) shutting down a service and restarting it would always start in an “OK” state. I mention umounting a file system as an analogy if it helps

Is this expected ES behaviour (which is weird) or does it imply something’s wrong. The logs certainly don’t show any problem - but part of the reason I’ve come across this quite a lot is because I’d start seeing graylog slow down in searches and nothing but a restart would fix it - which implies these unassigned shards are causing problems. But they don’t even show up until an ES restart - so I’m looking for better ways of detecting this

This is graylog-server-2.3.0-7.noarch elasticsearch-2.4.6-1.noarch on CentOS-7




Someone else will probably come along with a more correct answer, but here is mine.

This is expected behavior - it is just how Elasticsearch nodes work as they startup.

I read your question and the answers posted on the ES forum, and tried to find a better description of the process.

In short:

  • On startup a node running ES seems to say “Everything is in bad/unknown shape until I have had a chance to check”. At this point everything shows red.
  • After learning the status of all indexes and nodes, and making sure all primary shards are online the status moves to yellow.
  • After all replica shards are happy, and there are no unassigned shards status moves to green.

When we restart our ES Cluster (12 nodes) running behind Graylog, it may take 30 minutes for everything to arrive at green. In terms of detection, our concern is that the cluster goes RED when we don’t expect it. Startup and node restarts generate status changes.

Hope this helps.

Dustin Tennill

Thanks Dustin. If it’s expected behaviour, then “phew!”. But as the status is “red”, does that mean that after ES restarts (or system reboots), graylog will notice the “red”, and block putting data into ES until it goes “green”?

The reason that matters is that you’d have to ensure your message_journal can handle the volume - if it’s too small (5G by default), or ES takes too long to come back “green”, you’ll lose data?

Just trying to understand the idiosyncrasies and ensuring we’re operating it all correctly :slight_smile:



“red” does not mean it will not accept new messages. ES can accept new data when red. Red means more like: “I might have lost some of your data.” Yellow is more like: “At the moment I am not redundant, so if a node goes down, you might lose some of your data.”

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.