Unprocessed messages

Hello,
We have:
1 graylog server, 4 elasticsearch nodes (1 balancer and 3 data), and ~20.000 msg/second.

Everything works fine, but process buffer is full all the time,
and ~3.000.000 unprocessed messages

{
“cluster_name” : “gl2”,
“status” : “green”,
“timed_out” : false,
“number_of_nodes” : 4,
“number_of_data_nodes” : 3,
“active_primary_shards” : 80,
“active_shards” : 80,
“relocating_shards” : 0,
“initializing_shards” : 0,
“unassigned_shards” : 0,
“delayed_unassigned_shards” : 0,
“number_of_pending_tasks” : 0,
“number_of_in_flight_fetch” : 0,
“task_max_waiting_in_queue_millis” : 0,
“active_shards_percent_as_number” : 100.0
}

is_master = true
node_id_file = /etc/graylog/server/node-id
plugin_dir = /usr/share/graylog-server/plugin
rest_listen_uri =
rest_transport_uri =
web_enable = true
web_listen_uri =
elasticsearch_hosts =
elasticsearch_connect_timeout = 2s
elasticsearch_socket_timeout = 60s
elasticsearch_max_total_connections = 2000
elasticsearch_max_total_connections_per_route = 2000
elasticsearch_max_retries = 2
rotation_strategy = count
elasticsearch_max_docs_per_index = 50000000
elasticsearch_max_number_of_indices = 5
retention_strategy = delete
elasticsearch_shards = 24
elasticsearch_replicas = 1
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = false
elasticsearch_analyzer = standard
output_batch_size = 10000
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 16
outputbuffer_processors = 8
processor_wait_strategy = blocking
ring_size = 262144
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir =nal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb:
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
content_packs_dir =
content_packs_auto_load = grok-patterns.json
proxied_requests_thread_pool_size = 32

Is it normal, and if not how to fix it?

you are missing the following Information:

  • Graylog version
  • Elasticsearch version

Logfiles of Graylog and Elasticsearch might help too as without any it is just a wild guessing. But I think that your elasticsearch is not able to handle the load.

Jan, thanks for reply!

Graylog 2.3.2+3df951e
Elasticsearch 5.6.5-1

13267 messages in output buffer, 5.06% utilized.

I think that your elasticsearch is not able to handle the load.

On all 4 nodes of elasticsearch LA 3-4 (with 8 cores)
memory used on 50-60 percent.

What else can be bad?

Did you checked the Health of your Cluster? Did you checked the Logfiles? Did you checked your Elasticsearch Metrics?

All the above would you give information what is bad.

Do you have >90% CPU load on the Graylog server, when ? If not, you can still increase the number of processbuffer processors and see if that helps. Also, you have quite a lot of outputbuffer processors (8) for the outputbuffer size. You could get to 30000 msgs/s with just 3 outputbuffer processors, so you could move 5 of those to the processbuffer processors.

Hi,

I have >90% CPU load on the Graylog server, LA 11 (on 8 core processor)

outputbuffer processors is set on 3 now,

maybe I need to add graylog-server nodes?

Elasticsearch nodes is not overloaded, but numbers of unprocessed messages is still hight.

If you have checked that you don’t have problematic regexes in extractors or pipelines, then I’d say it is time to add more graylog nodes or processors to the existing node. But it would be good to check the regexes first.

See e.g. https://www.regular-expressions.info/catastrophic.html

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.