Disk journal speed


(Jari A) #1

Hello,

I have searched and searched but no luck. From this forum, github and everywhere I could find anything. Also, I’m not trying to hijack some older threads about this. If I should open(?) some older, similar question please let me know.

I had strange day with Graylog 2.3 (more details below about environment). One device sends a lot of logs for some time. Something between 1000-1500 msg/s. Even now it stopped, but there is still over 4 million messages in disk journal (it didn’t get full). And my question is this:
How fast (or slow) messages should run out of disk journal? Depending many thinks I suppose, but with what? How can I adjust that? Or can I? Should I? I’m confused.
Problem is that there is need to get read (and search) those messages, but those seem to be stuck in there.

Currently I have in messages 20 - 150 /s. Out says 0 - 1200 msg/s while it likes to stay at 0 for some reason.
Normally in and out has same values.

There is not much load, wait or other obvious bottlenecks I could find of. Earlier posts suggest that problem may be in Elasticsearch (2.4.6) end. Maybe? There isn’t any errors or throttling in logs.

Process buffer is constantly 100% in use with 65536 messages in it. I checked all extractors if some of those takes long to process. None of those stands out.

Server1: CentOS 7 (8 vCPU, 6 GB RAM)

  • Graylog 2.3 with 2GB heap size (from RPM)
  • MongoDB 3.2.16 (from RPM)
  • Apache httpd as reverse proxy (+ TLS offload)

Server2: CentOS 7 (4 vCPU, 4 GB RAM, all-flash storage behind virtualization)

  • Elasticsearch 2.4.6 (from RPM)

Server1 and 2 are in same subnet.


server.conf:

allow_highlighting = false
allow_leading_wildcard_searches = false
content_packs_auto_load = grok-patterns.json
content_packs_dir = /usr/share/graylog-server/contentpacks
elasticsearch_analyzer = standard
elasticsearch_hosts = http://elastic_server_ip:9200
elasticsearch_index_prefix = graylog
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 7
elasticsearch_max_size_per_index = 10240000000
elasticsearch_replicas = 0
elasticsearch_shards = 4
http_connect_timeout = 15s
http_read_timeout = 15s
http_write_timeout = 15s
inputbuffer_processors = 4
inputbuffer_ring_size = 65536
inputbuffer_wait_strategy = blocking
is_master = true
lb_recognition_period_seconds = 3
message_journal_dir = /var/lib/graylog-server/journal
message_journal_enabled = true
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
mongodb_uri = mongodb://localhost/graylog
node_id_file = /etc/graylog/server/node-id
output_batch_size = 1000
outputbuffer_processors = 7
output_fault_count_threshold = 5
output_fault_penalty_seconds = 15
output_flush_interval = 1
password_secret = ...
plugin_dir = /usr/share/graylog-server/plugin
processbuffer_processors = 19
processor_wait_strategy = blocking
rest_listen_uri = http://gl2_ip:9000/api/
rest_transport_uri = http://gl2_ip:9000/api/
retention_strategy = delete
ring_size = 65536
root_password_sha2 = ...
rotation_strategy = count
rotation_strategy = size
transport_email_enabled = true
transport_email_from_email = graylog@domain.tld
transport_email_hostname = mail.domain.tld
transport_email_port = 25
transport_email_subject_prefix = [graylog]
transport_email_use_auth = false
transport_email_use_ssl = false
transport_email_use_tls = false
web_listen_uri = http://gl2_ip:9000/

elasticsearch.yml

cluster.name: graylog2
node.name: gl-elanode-1
network.host: elasticserver_ip
script.inline: false
script.indexed: false
script.file: false
index.refresh_interval: 30s
indices.store.throttle.max_bytes_per_sec: 150mb

Any suggestions?
Thanks!


#2

First I would add ES node RAM to at least 16G and Elasticsearch JVM to half of that. ES needs a lot of RAM, much more than Graylog (it caches heavily). Number of processors is not important, you could probably drop couple and get away with 2 vCPUs.

My experience is that even millions of messages from disk journal are read quickly, it is more of a question about ES speed.

I recommend reducing outputbuffer processors to 2 and increasing outputbuffer size to 10000, once you have increased ES memory.


#3

hmm… I noticed that you told process buffer is always 100%. That does not sound like an ES problem (although I still recommend adding memory there). Are your output buffers empty? Are you sure you don’t have extractors there that consume more than 100 microseconds (95 percentile) there?


(Jari A) #4

Hi,

thank you for replies!
Output buffer did show 0% utilization then (and now).

I changed Elasticsearch to use 16 GB RAM and Java uses now 50% of it. While elasticsearch was down for this change, disk journal did get messages and now those ran out pretty fast.

I will change those other settings soon, too.

Thank you!

Br,
Jari


(Nimol) #5

Hi,
I changed the ring_size = to 131072 and also the amount of the inputbuffer_ring_size to the same and after that everything was working fine. stop your input until all messages are processed and after that start your inputs.


(system) #6

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.