Issues with Master Node after upgrade to 2.4.3

nadx969 · February 22, 2018, 3:53pm

Beginning of this week we upgraded graylog from 2.2 to 2.4.3
After the upgrade the master node began to stop processing messages, the daemon hung. no errors or indicators in the logs
restarted the daemon and it would start processing again with a large queue of messages to process. this happens every 10 hours.
it also can be noted that the master node seems to ingest more of the data and has a tougher time with it than the rest of the nodes

we have 4 graylog nodes on c4.2xlarge instances and 4x elasticsearch nodes on c4.4xlarge
we process on average 250GB
use round robin dns for the UDP inputs
load balancer for the UI

node_id_file = /etc/graylog/server/node-id
password_secret = xxx
root_password_sha2 = xxx
root_timezone = UTC
plugin_dir = /usr/share/graylog-server/plugin
rest_listen_uri = http://x.x.x.x:9000/api/
web_listen_uri = http://x.x.x.x:9000/
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = true
allow_highlighting = false
elasticsearch_hosts = http://x.x.x.x:9200,http://x.x.x.x:9200,http://x.x.x.x:9200,http://x.x.x.x:9200
elasticsearch_analyzer = standard
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 6
outputbuffer_processors = 4
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://user:xxx@x.x.x.x:27017,x.x.x.x:27017,x.x.x.x:27017/graylog
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
content_packs_dir = /usr/share/graylog-server/contentpacks
content_packs_auto_load = grok-patterns.json
proxied_requests_thread_pool_size = 32```

Any help on how to understand why the master keeps crashing would be most welcome

jan · February 22, 2018, 4:07pm

I would say - check your elasticsearch cluster. (with the given information)

Maybe it is everytime when you rotate the index?

nadx969 · February 22, 2018, 4:37pm

Hi @jan
its not when we rotate the index. timing doesnt line up

however, there could be some performance tuning with elasticsearch.
the ES_HEAP_SIZE was at 10GB and we could push it to 15GB (instances have 30GB total)
based on the docs http://docs.graylog.org/en/2.4/pages/configuration/elasticsearch.html there are a couple of other performance tips like setting indices.store.throttle.max_bytes_per_sec in elasticsearch

I am going to make these changes and see if that helps with the master node.

nadx969 · February 24, 2018, 12:34am

@jan
you were correct. it is when the indices are rotated
found some other issues too and sorted them out. however the master node crashes a few minutes after the default index and one other gets rotated

system · March 10, 2018, 12:34am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog master down and output buffer full Graylog Central (peer support)	3	1730	August 9, 2019
Graylog not processing data Graylog Central (peer support)	13	6937	March 30, 2017
Connection to Elastic Stops Graylog Central (peer support)	7	724	February 22, 2018
Graylog, log problem Graylog Central (peer support)	23	2127	March 18, 2019
Graylog doesn't process messages from journal Graylog Central (peer support)	3	490	March 25, 2020

Issues with Master Node after upgrade to 2.4.3

Related topics