Elasticsearch service down

Sometimes in graylog inputs are not getting processed when i checked elasticsearch service is stopped.when i start elasticsearch everything is back to normal.This happens frequently.

Message in indexers failure and logs:

> graylog_13 d4881fa2-7200-11ec-9125-005056923c4a {"type":"unavailable_shards_exception","reason":"[graylog_13][1] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[graylog_13][1]] containing [110] requests]"}

Pls help

Hello

Are you running out of disk space?

no, i have over 500gb free on each node

Ok,
Can you show the output of this command, You may need to adjust it for your environment

curl -XGET http://localhost:9200/_cluster/allocation/explain?pretty

Please find the output

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"
  },
  "status" : 400
}

Ok,
Next troubleshooting tip . What is the output of this command?

curl -XGET http://localhost:9200/_cluster/health?pretty=true
You mention something about…

do you have two elasticsearch nodes?

pls find the ouptut

{
  "cluster_name" : "elasticsearch",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 72,
  "active_shards" : 72,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

do you have two elasticsearch nodes?
3 es-nodes
i.e 3 servers with graylog,mongod,elasticsearch in all of them

Ok,
What I know now is…

  • So you have plenty of free space in your /data directory on ALL ES nodes, Correct?
  • Cluster is green ALL ES nodes, and I see you have all three nodes shown in that last command.
  • No problem with shards.

Just curious did you execute those command on each Elasticsearch node?

Next, troubleshooting tip.
This command will make sure all index are green.
EDIT: I forgot to put this command here :smiley:

curl -XGET 'http://localhost:9200/_cluster/health?level=indices&pretty'

Check list

  • Did you make sure elasticsearch service on all nodes were enabled?

systemctl enable elasticsearch

  • Permission are good on all nodes?
  • Have you tried manually rotating you indices to see if this happens again?

This issue sometimes happens when…

  1. You running out of disk space.
  2. Have to many shard allocated.
  3. Out of memory conditions resulted in orphaned Elasticsearch indices <— I’m leaning toward this is what’s happening.
1 Like

yes
glog - 555G
glog2 - 527G
glog3 - 538G

yes and All nodes have same output

yeah all are enabled and permissions are good

No when i start elasticsearch service everything is back to normal
with multiple indexer failures

Does this mean i have to increase JVM heap space?

All the inputs are processed by only one node i need to add loadbalancer but still is 200msgs/sec too much to process just for one node?

Ok,
So since you have a cluster (3) and was this the master node you restart the service on? If so, then you this may transferred a new master. If you havent already this will check all your indices

curl -XGET 'http://localhost:9200/_cat/indices?pretty'

This means your starving elasticsearch for memory.

I have no idea what you environment looks like so I can tell you how to fix it unless you share more information. Please look here for a better understanding.

1 Like

Thank you gsmith for your quick and kind responses.I willl share the required info soon. :smiley:

1 Like

Just get all your ducks in order and I’m sure you will get a better answer quicker.
Make sure you have all you specification to your ES cluster since this is where the issue is at. Current log files would be appreciated also.
I would hate to tell you something to do and it would be a incorrect resolution to this issue.

1 Like

elasticsearch logs would be very helpful

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.