Our Graylog is growing its journal about 0.1% / sec. I don’t get it why. Should I add additional node for Graylog or Elasticsearch? Or is there anything else that could be tweaked?
Graylog 2.4.3 on one server:
6 GB RAM (Gl’s heap size is 2 GB)
Elasticsearch 2.4.6 on one server:
16 GB RAM (heap size is 8 GB)
Graylog logs starts nagging when journal grows over 95%. Elastic logs won’t tell anything.
There is no iowait on both servers. Elastic CPU is mostly idling.
Graylog uses around one core every now and then. Most of the time CPU idle seen (from glances util) is around 90%.
If I stop Graylog and drop journal, input messages and output messages keeps going all the time. But still journal start growing. After several hours, process buffer goes to 100%.
Messages per second varies from 100 - 400, usually it is around 200.
Graylog conf looks like this:
allow_highlighting = false
allow_leading_wildcard_searches = false
content_packs_auto_load = grok-patterns.json
content_packs_dir = /usr/share/graylog-server/contentpacks
elasticsearch_analyzer = standard
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 7
elasticsearch_max_size_per_index = 10240000000
elasticsearch_replicas = 0
elasticsearch_shards = 4
http_connect_timeout = 15s
http_read_timeout = 15s
http_write_timeout = 15s
inputbuffer_processors = 4
inputbuffer_ring_size = 65536
inputbuffer_wait_strategy = blocking
is_master = true
lb_recognition_period_seconds = 3
message_journal_dir = /var/lib/graylog-server/journal
message_journal_enabled = true
message_journal_flush_age = 30s
message_journal_max_age = 72h
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
mongodb_uri = mongodb://localhost/graylog
node_id_file = /etc/graylog/server/node-id
output_batch_size = 2500 #was 1000, didn’t change anything
outputbuffer_processors = 8
output_fault_count_threshold = 5
output_fault_penalty_seconds = 15 #too high?
output_flush_interval = 1
plugin_dir = /usr/share/graylog-server/plugin
processbuffer_processors = 8
processor_wait_strategy = blocking
retention_strategy = delete
ring_size = 262144
rotation_strategy = count
rotation_strategy = size
stale_master_timeout = 5000
Any ideas? Thanks!