Greetings, its been a while.
I recently had a minute to breather and set up a Zabbix server to monitor my Graylog cluster. By way of some background, my cluster is as follows:
Manager Nodes (Manager Only Eligible)
opnsrchmgr-0
opnsrchmgr-1
opnsrchmgr-2
Data Nodes (Data Only)
opnsrchdata-0 (Hot)
opnsrchdata-1 (Hot)
opnsrchdata-2 (Hot)
opnsrchcold-0 (Cold)
Server:
graylogsvr-0
ILM:
Opensearch Dashboard
Well, everything appears to be running well except that Zabbix has alerted me to the fact that my cluster has only two [Manager]-eligible nodes. This shouldn’t be as I have explicitly configured three manager nodes as potential managers. Further, Opensearch bears this out:
curl -s k -u <username>:'<password>' 'https://opnsrchmgr-0.foo.bar:9200/_cat/nodes?v'
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role node.roles cluster_manager name
192.168.8.6 51 97 16 16.13 12.82 8.64 d data - opnsrchdata-1
192.168.8.4 63 97 38 16.13 12.82 8.64 - cluster-manager - opnsrchmgr-2
192.168.8.3 24 97 1 16.13 12.82 8.64 m cluster_manager - opnsrchmgr-1
192.168.8.7 63 97 19 16.13 12.82 8.64 d data - opnsrchdata-2
192.168.8.5 46 97 8 16.13 12.82 8.64 d data - opnsrchdata-0
192.168.8.14 64 97 20 16.13 12.82 8.64 d data - opnsrchcold-0
192.168.8.2 62 97 5 16.13 12.82 8.64 m cluster_manager * opnsrchmgr-0
As you can see, Opensearch “knows” that opnsrchmgr-2 is eligible to be only a cluster-manager but under node.role, nothing is elected “-”.
Further, a cluster master has been elected:
curl -s k -u <username>:'<password>' 'https://opnsrchmgr-0.foo.bar:9200/_cluster/health?pretty'
{
"cluster_name" : "graylog",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 7,
"number_of_data_nodes" : 4,
"discovered_master" : true,
"discovered_cluster_manager" : true,
"active_primary_shards" : 350,
"active_shards" : 540,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
I have attempted to manually force opnsrchmgr-2 into a role “m” but nothing doing.
Has anyone encountered this issue?
Thank you!