Elasticsearch service down

groovyghost · January 11, 2022, 1:59am

Sometimes in graylog inputs are not getting processed when i checked elasticsearch service is stopped.when i start elasticsearch everything is back to normal.This happens frequently.

Message in indexers failure and logs:

> graylog_13 d4881fa2-7200-11ec-9125-005056923c4a {"type":"unavailable_shards_exception","reason":"[graylog_13][1] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[graylog_13][1]] containing [110] requests]"}

Pls help

gsmith · January 11, 2022, 2:38am

Hello

Are you running out of disk space?

groovyghost · January 11, 2022, 2:48am

no, i have over 500gb free on each node

gsmith · January 11, 2022, 2:51am

Ok,
Can you show the output of this command, You may need to adjust it for your environment

curl -XGET http://localhost:9200/_cluster/allocation/explain?pretty

groovyghost · January 11, 2022, 3:05am

Please find the output

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"
  },
  "status" : 400
}

gsmith · January 11, 2022, 3:08am

Ok,
Next troubleshooting tip . What is the output of this command?

curl -XGET http://localhost:9200/_cluster/health?pretty=true
You mention something about…

do you have two elasticsearch nodes?

groovyghost · January 11, 2022, 3:15am

pls find the ouptut

{
  "cluster_name" : "elasticsearch",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 72,
  "active_shards" : 72,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

do you have two elasticsearch nodes?
3 es-nodes
i.e 3 servers with graylog,mongod,elasticsearch in all of them

gsmith · January 11, 2022, 3:31am

Ok,
What I know now is…

So you have plenty of free space in your /data directory on ALL ES nodes, Correct?
Cluster is green ALL ES nodes, and I see you have all three nodes shown in that last command.
No problem with shards.

Just curious did you execute those command on each Elasticsearch node?

Next, troubleshooting tip.
This command will make sure all index are green.
EDIT: I forgot to put this command here

curl -XGET 'http://localhost:9200/_cluster/health?level=indices&pretty'

Check list

Did you make sure elasticsearch service on all nodes were enabled?

systemctl enable elasticsearch

Permission are good on all nodes?
Have you tried manually rotating you indices to see if this happens again?

This issue sometimes happens when…

You running out of disk space.
Have to many shard allocated.
Out of memory conditions resulted in orphaned Elasticsearch indices <— I’m leaning toward this is what’s happening.

groovyghost · January 11, 2022, 3:44am

yes
glog - 555G
glog2 - 527G
glog3 - 538G

yes and All nodes have same output

yeah all are enabled and permissions are good

No when i start elasticsearch service everything is back to normal
with multiple indexer failures

groovyghost · January 11, 2022, 3:49am

Does this mean i have to increase JVM heap space?

All the inputs are processed by only one node i need to add loadbalancer but still is 200msgs/sec too much to process just for one node?

gsmith · January 11, 2022, 3:49am

Ok,
So since you have a cluster (3) and was this the master node you restart the service on? If so, then you this may transferred a new master. If you havent already this will check all your indices

curl -XGET 'http://localhost:9200/_cat/indices?pretty'

gsmith · January 11, 2022, 3:51am

This means your starving elasticsearch for memory.

I have no idea what you environment looks like so I can tell you how to fix it unless you share more information. Please look here for a better understanding.

groovyghost · January 11, 2022, 3:55am

Thank you gsmith for your quick and kind responses.I willl share the required info soon.

gsmith · January 11, 2022, 3:58am

Just get all your ducks in order and I’m sure you will get a better answer quicker.
Make sure you have all you specification to your ES cluster since this is where the issue is at. Current log files would be appreciated also.
I would hate to tell you something to do and it would be a incorrect resolution to this issue.

maniel · January 17, 2022, 9:06am

elasticsearch logs would be very helpful

system · January 31, 2022, 9:07am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help with shards Graylog Central (peer support)	10	4213	February 15, 2019
Elasticsearch disconnecting frequently Graylog Central (peer support)	10	775	July 23, 2021
There were 204,064 failed indexing attempts in the last 24 hours Graylog Central (peer support)	11	3615	February 4, 2020
Graylog Elasticsearch cluster is yellow since 3 days back Graylog Central (peer support)	10	3164	July 5, 2018
Graylog running but elasticsearch cluster health is red Graylog Central (peer support)	4	2939	March 1, 2018

Elasticsearch service down

Related topics