Beginning of this week we upgraded graylog from 2.2 to 2.4.3
After the upgrade the master node began to stop processing messages, the daemon hung. no errors or indicators in the logs
restarted the daemon and it would start processing again with a large queue of messages to process. this happens every 10 hours.
it also can be noted that the master node seems to ingest more of the data and has a tougher time with it than the rest of the nodes
we have 4 graylog nodes on c4.2xlarge instances and 4x elasticsearch nodes on c4.4xlarge
we process on average 250GB
use round robin dns for the UDP inputs
load balancer for the UI
node_id_file = /etc/graylog/server/node-id
password_secret = xxx
root_password_sha2 = xxx
root_timezone = UTC
plugin_dir = /usr/share/graylog-server/plugin
rest_listen_uri = http://x.x.x.x:9000/api/
web_listen_uri = http://x.x.x.x:9000/
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = true
allow_highlighting = false
elasticsearch_hosts = http://x.x.x.x:9200,http://x.x.x.x:9200,http://x.x.x.x:9200,http://x.x.x.x:9200
elasticsearch_analyzer = standard
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 6
outputbuffer_processors = 4
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://user:xxx@x.x.x.x:27017,x.x.x.x:27017,x.x.x.x:27017/graylog
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
content_packs_dir = /usr/share/graylog-server/contentpacks
content_packs_auto_load = grok-patterns.json
proxied_requests_thread_pool_size = 32```
Any help on how to understand why the master keeps crashing would be most welcome