Hi,
I’m trying to setup a Graylog-Cluster for testing purposes.
The Replica Set for MongoDB as well as the Opensearch Cluster seems to work just fine, but unfortunately clustering graylog is unsuccessful:
This message appears at System/Overview although I have the is_leader = true set on one node.
When I start only one node, there’s no warning but the moment I bring the second and third node up, Graylog starts flapping under System/Nodes. In the node overview I see only one node, not three. Usually it’s the last one I restarted, so I see the graylog3 for example from the graylog1 (but not the graylog1 itself at that moment).
The actual leader node is switching, sometimes it’s graylog1, sometimes graylog2 and sometimes graylog3. There’s always one node at a time displayed in System/Nodes, not two or more at once.
My setup:
It’s a three node setup:
graylog1 → Graylog Open 5.2.3, MongoDB 6.0.13, Opensearch 2.11.1
graylog2 → Graylog Open 5.2.3, MongoDB 6.0.13, Opensearch 2.11.1
graylog3 → Graylog Open 5.2.3, MongoDB 6.0.13, Opensearch 2.11.1
All are running on Alma Linux 9.3, which are deployed in VirtualBox. There’s no specific DNS-Server configured, the resolution for graylog[1-3].xxx.de is set in /etc/hosts
server.conf:
is_leader = true
node_id_file = /etc/graylog/server/node-id
password_secret = XXXXXXXXXXXXXXXXXXXXXXX
root_password_sha2 = XXXXXXXXXXXXXXXXXX
bin_dir = /usr/share/graylog-server/bin
data_dir = /var/lib/graylog-server
plugin_dir = /usr/share/graylog-server/plugin
http_bind_address = 192.168.178.129:9000
http_external_uri = http://graylog.xxx.de/
stream_aware_field_types=false
elasticsearch_hosts = http://graylog1.xxx.de:9200,http://graylog2.xxx.de:9200,http://graylog3.xxx.de:9200
disabled_retention_strategies = none
allow_leading_wildcard_searches = false
allow_highlighting = false
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 5
outputbuffer_processors = 3
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
stale_leader_timeout = 10000
mongodb_uri = mongodb://graylog:graylog@graylog1.xxx.de:27017,graylog2.xxx.de:27017,graylog3.xxx.de:27017/graylog?replicaSet=rs01
mongodb_max_connections = 1000
This looks (almost) exactly the same on all three nodes, except for the is_leader = false on graylog2 and graylog3.
Opensearch.yml:
cluster.name: graylog
node.name: graylog1.xxx.de
path.data: /var/lib/opensearch
path.logs: /var/log/opensearch
network.host: 192.168.178.129
http.port: 9200
discovery.seed_hosts: ["graylog2.xxx.de", "graylog3.xxx.de"]
cluster.initial_master_nodes: ["graylog1.xxx.de", "graylog2.xxx.de", "graylog3.xxx.de"]
action.auto_create_index: false
plugins.security.disabled: true
node.roles: ['data', 'master']
Might there be a problem, having the “plugins.security.disabled: true” after all the other “plugin.security.*”-Configurations?
Output of the Opensearch Cluster-Health:
{
"cluster_name" : "graylog",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"discovered_master" : true,
"discovered_cluster_manager" : true,
"active_primary_shards" : 6,
"active_shards" : 11,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
I tried:
- Setting the http_publish_uri
- Setting the http_external_uri to the Domain to the Apache which should load balance, I haven’t set that up, but it shouldn’t be a problem for clustering issues?
- Using is_master instead of is_leader
- Adjusting the Opensearch-Cluster (adding the node.roles, setting the cluster.initial_master_nodes)
- Time should be in sync with chrony?
- I can curl the graylog2.xxx.de:9000 from graylog1 and graylog3 and the other ways around
- Starting one by one, starting graylog2 and graylog3 simultaneously
I can provide the log files from /var/log/graylog-server/server.log, but there are no errors and everything looks good.
I’m new to Graylog (at least setting it up), so I’m open to every suggestion having a better configuration.
Do you have any idea how to get the cluster rolling?
The problem seems to be (as Graylog says), that the three nodes can’t elect a leader, but I don’t know why, since I have the is_leader = true set.
Regards