Elasticsearch Node is in Red Status

gsmith · April 5, 2022, 1:33am

All,
An issue occurred recently, and I want to show how I was able to troubleshoot and resolve this. Reminder, this was my Lab Graylog server, so I assume this was a user error on my part

About a week ago I installed Cerebra on my remote Graylog Docker test VM GitHub - lmenezes/cerebro for testing purposes. Today I logged in Cerebra and noticed one of my Elasticsearch nodes was in RED. It showed three different indices that had unassigned shards in them, mainly the new indices that were supposed to be created within the paste 4 days.

First, I execute command below which I think most of us use know for troubleshooting shard issues.

[root@graylog conf]# curl -XGET http://192.168.1.100:9200/_cluster/allocation/explain?pretty
{
"index" : "gl-events_33",
"shard" : 2,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "INDEX_CREATED",
"at" : "2022-04-01T00:01:01.903Z",
"last_allocation_status" : "no"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "srgsYBshRb-ue3IQzIK-RQ",
"node_name" : "graylog.domain.com",
"transport_address" : "192.168.1.100:9300",
"node_decision" : "no",
"weight_ranking" : 1,
"deciders" : [
{
"decider" : "enable",
"decision" : "NO",
"explanation" : "no allocations are allowed due to cluster setting [cluster.routing.allocation.enable=none]"
}
]
}
]
}

That command gave me a clear idea what was wrong and after 5 minutes of research I found out what I needed to do. I noticed Cerebra has a rest API command section on the web, which is much like OpenSearch/Elasticsearch. Good Job OpenSource Guys

Example of the CLI:

I copy & Paste this command below, in that box .

PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.enable": "primaries"
}
}

Execute a service restart for Elasticsearch.

systemctl restart elasticsearch

Just to make sure things were good I execute this.

curl -XGET http://192.168.1.100:9200/_cat/shards

I tailed my Elasticsearch log file.

root@graylog conf]# tail -f /var/log/elasticsearch/graylog.log
[2022-04-04T16:44:27,711][INFO ][o.e.n.Node ] [graylog.domain.com] starting ...
[2022-04-04T16:44:27,883][INFO ][o.e.t.TransportService ] [graylog.enseva-labs.net] publish_address {192.168.1.100:9300}, bound_address es {192.168.1.100:9300}
[2022-04-04T16:44:29,217][INFO ][o.e.c.c.Coordinator ] [graylog.domain.com] cluster UUID [OMgi3eu5QGiJ3buKOYn4_w]
[2022-04-04T16:44:29,522][INFO ][o.e.c.s.MasterService ] [graylog.domain.com] elected-as-master ([1] nodes joined)[{graylog.domain.com}{srgsYBshRb-ue3IQzIK-RQ}{4i1Mvoc8Q0epxjIP7STI9g}{192.168.1.100}{192.168.1.100:9300}{dimr} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 59, version: 135416, delta: master node changed {previous [], current [{graylog.domain.com}{srgsYBshRb-u e3IQzIK-RQ}{4i1Mvoc8Q0epxjIP7STI9g}{192.168.1.100}{192.168.1.100:9300}{dimr}]}
[2022-04-04T16:44:29,925][INFO ][o.e.c.s.ClusterApplierService] [graylog.domain.com] master node changed {previous [], current [{g raylog.domian.com}{srgsYBshRb-ue3IQzIK-RQ}{4i1Mvoc8Q0epxjIP7STI9g}{192.168.1.100}{192.168.1.100:9300}{dimr}]}, term: 59, version: 1354 16, reason: Publication{term=59, version=135416}
[2022-04-04T16:44:29,967][INFO ][o.e.h.AbstractHttpServerTransport] [graylog.domain.com] publish_address {192.168.1.100:9200}, bound _addresses {192.168.1.100:9200}
[2022-04-04T16:44:29,968][INFO ][o.e.n.Node ] [graylog.domain.com] started
[2022-04-04T16:44:30,619][INFO ][o.e.c.s.ClusterSettings ] [graylog. domain.com] updating [cluster.routing.allocation.enable] from [all] to [primaries]
[2022-04-04T16:44:30,620][INFO ][o.e.c.s.ClusterSettings ] [graylog. domain.com] updating [action.destructive_requires_name] from [false] to [true]
[2022-04-04T16:44:31,603][INFO ][o.e.g.GatewayService ] [graylog. domain.com] recovered [111] indices into cluster_state
[2022-04-04T16:45:14,265][INFO ][o.e.c.r.a.AllocationService] [graylog. domain.com] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[gl-system-events_20][2], [gl-system-events_20][3], [gl-system-events_20][0]]]).
[2022-04-04T16:45:17,006][INFO ][o.e.c.r.a.AllocationService] [graylog. domain.com] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[graylog_1560][2], [graylog_1560][3], [graylog_1560][0]]]).

As you can see from the logs worked out, and the cluster was in Green status. I had to wait about 5-10 minutes for Graylog to settle down because the buffer/journal was full from this weekend. Other then that all is good.

Topic		Replies	Views
Elasticsearch cluster is red. Default Index set shard allocation issue Graylog Central (peer support) elastic	5	1368	July 6, 2023
Elasticsearch shards unassigned Graylog Central (peer support)	2	3883	July 26, 2018
There were 204,064 failed indexing attempts in the last 24 hours Graylog Central (peer support)	11	3605	February 4, 2020
Graylog in a constant loop of relocating / initializaing shards Graylog Central (peer support)	4	989	April 13, 2020
After edit of default index set,elasticsearch cluster is red Graylog Central (peer support)	2	1047	January 13, 2022

Elasticsearch Node is in Red Status

Related topics