Thanks Drew,
Thanks Tully,
Drew, thanks for your clarification about the config values. So it looks like the *buffer_processors values are not the culprits (I already suspected that, as changing them did not change anything in the behaviour).
We use Opensearch 2.6.0 as indexer.
(Don’t be confused: in the beginning we used Elasticsearch - the hostnames are still the same after changing to Opensearch)
This is our server.conf (without comments and anonymised). All 4 nodes are pretty identical (only node 4 has some more processbuffers, as it has more cores). Everything else is left on defaults.
is_leader = true
node_id_file = /etc/graylog/server/node-id
password_secret = *******************
root_password_sha2 = *******************
bin_dir = /usr/share/graylog-server/bin
data_dir = /var/lib/graylog-server
plugin_dir = /usr/share/graylog-server/plugin
http_bind_address: 192.168.1.1:9000
http_external_uri = http://graylog.mydomain.com
stream_aware_field_types=false
elasticsearch_hosts = http://192.168.1.10:9200,http://192.168.1.11:9200,http://192.168.1.12:9200,http://192.168.1.13:9200
elasticsearch_connect_timeout = 5s
elasticsearch_idle_timeout = 60s
elasticsearch_max_total_connections = 512
elasticsearch_max_total_connections_per_route = 128
rotation_strategy = size
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = false
elasticsearch_analyzer = standard
output_batch_size = 3000
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 12
outputbuffer_processors = 8
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
message_journal_max_size = 10gb
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://graylog01.mydomain.com:27017,graylog02.mydomain.com:27017,graylog04.mydomain.com:27017,graylog03.mydomain.com:27017/graylog?replicaSet=rs0
mongodb_max_connections = 1000
http_proxy_uri = http://proxy.mydomain:3128
http_non_proxy_hosts = localhost,127.0.0.1,192.168.1.*,*.mydomain.com
prometheus_exporter_enabled = true
prometheus_exporter_bind_address = 192.168.1.1:9833
Our output buffer values the last two weeks:
And the process buffers:
Unfortunately the timespan is very short, where we had the issues, as I immediately downgraded to fix it… So again there is not much visible…
Let me know, if you want a more “zoomed in” graph.
Drew, you are right, our machines are pretty high on load - I tried to utilize them as much as possible. However, the weeks before the update we had no problems, all logs could be written in time (no journal filling).