Graylog Disk Journal and Throttling

Hello,

I would like to ask for advice regarding wait strategy usage and whether it would benefit our use case.
Specifically, for the processor_wait_strategy setting in https://github.com/Graylog2/graylog2-server/blob/2.1.1/misc/graylog.conf#L352

Our current setup:

Kafka Broker --> Graylog --> Elasticsearch
We have kafka inputs in Graylog with throttling enabled, we also have journal enabled in graylog nodes.

Previously, we encountered an issue where our graylog processors are not fast enough to process logs causing the disk journal to fill up.
The behavior we observed during the incident is that graylog disk journal would delete messages in the disk journal when it got full, and keep on filling up even though the graylog nodes would show “THROTTLED” status. This I believe causing the kafka input offset cursor to move further even if we’re not indexing, causing data loss.

We were hoping that it would actually slow down the inputs instead of deleting messages in the journal.
Our kafka brokers are setup to buffer a huge amount of messages and we value “no data loss” than latency of logs.

In a side note, we fixed our issue above by increasing processbuffer_processors, outputbuffer_processors and output_batch_size, which I guess, sort of proves our ES cluster is fast enough.

Here’s an excerpt of our settings:

# Elasticsearch
elasticsearch_node_master = false
elasticsearch_node_data = false
elasticsearch_http_enabled = false
elasticsearch_config_file = /etc/graylog/server/graylog-elasticsearch.yml
elasticsearch_shards = 4
elasticsearch_replicas = 1
elasticsearch_index_prefix = dcslogs-prod
elasticsearch_cluster_name = dcslogs-prod-muc1
elasticsearch_transport_tcp_port = 9350
elasticsearch_discovery_zen_ping_unicast_hosts = ["10.36.20.143:9300"]
elasticsearch_network_host = 0.0.0.0
elasticsearch_analyzer = standard
output_batch_size = 3000
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30

# Processors
processbuffer_processors = 8
outputbuffer_processors = 5
async_eventbus_processors = 2
outputbuffer_processor_keep_alive_time = 5000
outputbuffer_processor_threads_core_pool_size = 3
outputbuffer_processor_threads_max_pool_size = 30
processor_wait_strategy = blocking
udp_recvbuffer_sizes = 1048576
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking

# Message journal
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
message_journal_max_age = 12h
message_journal_max_size = 10gb
message_journal_flush_age = 1m
message_journal_flush_interval = 1000000
message_journal_segment_age = 1h
message_journal_segment_size = 100mb

Thanks a lot!
Jan

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.