Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!
1. Describe your incident:
After upgrading from Graylog 5.x to 6.1.5, there has been a degradation in output performance.
2. Describe your environment:
-
OS Information: redhat 8 x64
-
Package Version: graylog 6.1.5, elasticsearch oss 7.10.2
-
Service logs, configurations, and environment variables:
elasticsearch_connect_timeout = 10s
elasticsearch_socket_timeout = 60s
elasticsearch_max_total_connections = 1024
elasticsearch_max_total_connections_per_route = 16
elasticsearch_max_retries = 2
rotation_strategy = count
retention_strategy = delete
allow_leading_wildcard_searches = false
allow_highlighting = false
output_batch_size = 100
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 128
outputbuffer_processors = 64
outputbuffer_processor_threads_max_pool_size = 64
udp_recvbuffer_sizes = 16777216
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 6
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /home1/graylog-server/journal
message_journal_max_size = 360gb
message_journal_max_age = 1h
lb_recognition_period_seconds = 3
lb_throttle_threshold_percentage = 90
3. What steps have you already taken to try and solve the problem?
It was a tuning configuration that had been working without issues, and no other changes were made except for the Graylog version upgrade.
4. How can the community help?
We are currently processing an input rate of 1.5 to 2 million per second.
Our setup includes:
- Graylog: 100 physical machines, each with 48 cores.
- Elasticsearch (ES) Pool: A cluster of approximately 200 nodes.
Since our log input is expected to increase further, we need to scale up the output throughput. What would be the best approach to achieve this?