We have 3-node graylog cluster running graylog server and mongodb on all three nodes(VMs) and 1 nginx loadbalancer(VM) to loadbalance log traffic as well
web UI accessand elastic search in installed on seperate 4-node Elastic cluster.
We are facing high CPU usage process java (graylog) someimes more 500% as you cvan see below in extract of top command on one node.
top - 08:00:21 up 20:24, 2 users, load average: 6.53, 5.83, 6.65
Tasks: 221 total, 1 running, 220 sleeping, 0 stopped, 0 zombie
%Cpu(s): 59.2 us, 0.2 sy, 0.0 ni, 30.0 id, 10.4 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 32773516 total, 570936 free, 8929288 used, 23273292 buff/cache
KiB Swap: 2097148 total, 2097148 free, 0 used. 23390284 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20457 root 20 0 13.844g 7.981g 21704 S 508.2 25.5 5585:04 java
Below is the configuration of graylog infrastructure:
–3 x nodes VMs running in VMware vsphere ESx6.x
Hardware Specification of each node
4 vCPUs 2.3Ghz
32GB RAM
Flash disk: 500GB
Software Specification:
OS: Red Hat Enterprise Linux Server release 7.4 (Maipo)
GrayLog Server: Graylog 2.4.3+2c41897
Mongodb: db version v3.6.2
git version: 489d177dbd0f0420a8ca04d39fd78d0a2c539420
OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
Java: java version "1.8.0_161 Java™ SE Runtime Environment (build 1.8.0_161-b12) Java HotSpot™ 64-Bit Server VM (build 25.161-b12, mixed mode)
–Elastic Search is running on 5.7.x on separate 4 node cluster
Below is the /etc/graylog/server/server.conf for master node and configuration for secondary/slave nodes are also same except is_master = flase .
is_master = true
node_id_file = /etc/graylog/server/node-id
password_secret = SbDz5kYSqpZ2jj18Nqvn80mflIkIbPMSggsNAK4UgDeNF73k8AgVnXrKBUWrFAxxiwfFf360cMegeqNEupFTtfs61PCux460
root_password_sha2 = 91aa480056871283357058827b45a528942cf2ada69b312575fa1898d9589f6c
plugin_dir = /usr/share/graylog-server/plugin
rest_listen_uri = http://log01.kz.local:9000/api/
rest_transport_uri = http://log01.kz.local:9000/api/
rest_enable_cors = false
rest_tls_cert_file = /etc/graylog/server/certificates/graylog.crt
rest_tls_key_file = /etc/graylog/server/certificates/graylog.key
trusted_proxies = 127.0.0.1/32, 0:0:0:0:0:0:0:1/128,10.237.95.0/32
web_listen_uri = http://log01.kz.local:9000/
web_enable_cors = false
elasticsearch_hosts = https://elastic:HVR8exrqVa1Qqkq6OAa5ykNP@4df4b80500ff4e1eab3b7e2e4e783564.kz.local
elasticsearch_connect_timeout = 30s
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = false
elasticsearch_analyzer = standard
output_batch_size = 2000
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 10
outputbuffer_processors = 10
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /app1/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://log01:27017,log02:27017,log03:27017/graylog?replicaSet=rs-db01
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
transport_email_web_interface_url = https://logexplorer.kz.local
content_packs_dir = /usr/share/graylog-server/contentpacks
content_packs_auto_load = grok-patterns.json
proxied_requests_thread_pool_size = 32
JVM setting for graylog in /etc/sysconfig/graylog-server
GRAYLOG_SERVER_JAVA_OPTS="-Xms8g -Xmx8g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow"