Journal keeps growing


Our Graylog is growing its journal about 0.1% / sec. I don’t get it why. Should I add additional node for Graylog or Elasticsearch? Or is there anything else that could be tweaked?

Graylog 2.4.3 on one server:
8 vCPU
6 GB RAM (Gl’s heap size is 2 GB)

Elasticsearch 2.4.6 on one server:
4 vCPU
16 GB RAM (heap size is 8 GB)
(flash disks)

Graylog logs starts nagging when journal grows over 95%. Elastic logs won’t tell anything.
There is no iowait on both servers. Elastic CPU is mostly idling.
Graylog uses around one core every now and then. Most of the time CPU idle seen (from glances util) is around 90%.

If I stop Graylog and drop journal, input messages and output messages keeps going all the time. But still journal start growing. After several hours, process buffer goes to 100%.
Messages per second varies from 100 - 400, usually it is around 200.

Graylog conf looks like this:
allow_highlighting = false
allow_leading_wildcard_searches = false
content_packs_auto_load = grok-patterns.json
content_packs_dir = /usr/share/graylog-server/contentpacks
elasticsearch_analyzer = standard
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 7
elasticsearch_max_size_per_index = 10240000000
elasticsearch_replicas = 0
elasticsearch_shards = 4
http_connect_timeout = 15s
http_read_timeout = 15s
http_write_timeout = 15s
inputbuffer_processors = 4
inputbuffer_ring_size = 65536
inputbuffer_wait_strategy = blocking
is_master = true
lb_recognition_period_seconds = 3
message_journal_dir = /var/lib/graylog-server/journal
message_journal_enabled = true
message_journal_flush_age = 30s
message_journal_max_age = 72h
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
mongodb_uri = mongodb://localhost/graylog
node_id_file = /etc/graylog/server/node-id
output_batch_size = 2500 #was 1000, didn’t change anything
outputbuffer_processors = 8
output_fault_count_threshold = 5
output_fault_penalty_seconds = 15 #too high?
output_flush_interval = 1
plugin_dir = /usr/share/graylog-server/plugin
processbuffer_processors = 8
processor_wait_strategy = blocking
retention_strategy = delete
ring_size = 262144
rotation_strategy = count
rotation_strategy = size
stale_master_timeout = 5000

Any ideas? Thanks!

you might need more Elasticsearch ressources.

1 Like

Thank you for your reply.
Just to make sure: You mean add another Elastic node? Or just add more memory?

could be both - without knowing any metrics of your setup it is hard to tell.

Or you might need to tweak Elasticsearch parameters. How many shards do you have? If they are about the same as in the Graylog conf you might have too small shards.

They are in their defaults in indices view:
Shards: 4
Replicas: 0

And there is only this one elasticserver.

Is there some recommended values for shards in this kind of setup? In some other post I found by googling around, someone changed shard to one.
Thanks, again!


1 Like

Ok, that wasn’t quite straightforward blog, but it did anyhow help to measure more proper shards value.

On this one specific indice that most of the stuff ends up, was default settings (Shards: 4). If suggested value is around 50 GB / shard, i counted it to 18 and now journal usage stays between 1 - 2 %.
Thanks for the link!

I also removed rDNS from inputs and removed all (three of them) pipelines that was in use. I used pipelines to delete some noisy messages.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.