Hello Sirs,
I know there are several reports of this case, but I am following all the possibilities that I found in the forum, but without success.
Basically, I have 3 nodes. They have a processing capacity each at about 1600m / s. But intermittently one of them stop process the messages but still send to journal… and the only way to re-process is to restart the Graylog service. But after doing this, a few minutes later, another node stops processing the messages and again I have to restart the Graylog service. The problem occurs in all nodes after a non periodic time, and one by one. I believed that it happened because of a bad message formation of a Fortigate / Fortinet log, but I corrected it with help here from the Forum, treating the messages as RAW. No logs from graylog and elasticsearch, even in debug mode give me clues to what’s going on.
3 Nodes with: 16 VCPUS, 24GB RAM, FC Disc Storage 3PAR - CentOS 7 Updated
CONFS
conf_graylog_node_1
is_master = true
node_id_file = /etc/graylog/server/node-id
password_secret = xxxxxxxxxxxxxxx
root_username = admin
root_password_sha2 = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
root_timezone = America/Sao_Paulo
plugin_dir = /usr/share/graylog-server/plugin
rest_listen_uri = http://192.168.0.195:9000/api/
web_listen_uri = http://192.168.0.195:9000/
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 1
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = false
elasticsearch_cluster_name = graylog
elasticsearch_discovery_zen_ping_unicast_hosts = 192.168.0.195:9300, 192.168.0.196:9300, 192.168.1.187
elasticsearch_cluster_discovery_timeout = 15000
elasticsearch_network_host = 192.168.0.195
elasticsearch_discovery_initial_state_timeout = 10s
elasticsearch_analyzer = standard
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 16
outputbuffer_processors = 3
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://graylog1:27017,graylog2:27017,graylog3:27017/graylog
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
content_packs_dir = /usr/share/graylog-server/contentpacks
content_packs_auto_load = grok-patterns.json
proxied_requests_thread_pool_size = 32
conf_graylog_node_2
is_master = false
node_id_file = /etc/graylog/server/node-id
password_secret = xxxxxxxxxxxxxxx
root_username = admin
root_password_sha2 = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
root_timezone = America/Sao_Paulo
plugin_dir = /usr/share/graylog-server/plugin
rest_listen_uri = http://192.168.0.196:9000/api/
web_listen_uri = http://192.168.0.196:9000/
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 1
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = false
elasticsearch_cluster_name = graylog
elasticsearch_discovery_zen_ping_unicast_hosts = 192.168.0.195:9300, 192.168.0.196:9300, 192.168.1.187
elasticsearch_cluster_discovery_timeout = 15000
elasticsearch_network_host = 192.168.0.196
elasticsearch_discovery_initial_state_timeout = 10s
elasticsearch_analyzer = standard
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 16
outputbuffer_processors = 3
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://graylog1:27017,graylog2:27017,graylog3:27017/graylog
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
content_packs_dir = /usr/share/graylog-server/contentpacks
content_packs_auto_load = grok-patterns.json
proxied_requests_thread_pool_size = 32
conf_graylog_node_3
is_master = false
node_id_file = /etc/graylog/server/node-id
password_secret = xxxxxxxxxxxxxxx
root_username = admin
root_password_sha2 = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
root_timezone = America/Sao_Paulo
plugin_dir = /usr/share/graylog-server/plugin
rest_listen_uri = http://192.168.1.187:9000/api/
web_listen_uri = http://192.168.1.187:9000/
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 1
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = false
elasticsearch_cluster_name = graylog
elasticsearch_discovery_zen_ping_unicast_hosts = 192.168.0.195:9300, 192.168.0.196:9300, 192.168.1.187
elasticsearch_cluster_discovery_timeout = 15000
elasticsearch_network_host = 192.168.1.187
elasticsearch_discovery_initial_state_timeout = 10s
elasticsearch_analyzer = standard
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 16
outputbuffer_processors = 3
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://graylog1:27017,graylog2:27017,graylog3:27017/graylog
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
content_packs_dir = /usr/share/graylog-server/contentpacks
content_packs_auto_load = grok-patterns.json
proxied_requests_thread_pool_size = 32
elasticsearch_conf_node_1
cluster.name: graylog
node.name: graylog1.example.com.br
network.host: 0.0.0.0
discovery.zen.ping.unicast.hosts: [“192.168.0.195”, “192.168.0.196”, “192.168.1.187”]
index.refresh_interval: 30s
index.translog.flush_threshold_ops: 50000
elasticsearch_conf_node_2
cluster.name: graylog
node.name: graylog2.example.com.br
network.host: 0.0.0.0
discovery.zen.ping.unicast.hosts: [“192.168.0.195”, “192.168.0.196”, “192.168.1.187”]
index.refresh_interval: 30s
index.translog.flush_threshold_ops: 50000
elasticsearch_conf_node_3
cluster.name: graylog
node.name: graylog3.example.com.br
network.host: 0.0.0.0
discovery.zen.ping.unicast.hosts: [“192.168.0.195”, “192.168.0.196”, “192.168.1.187”]
index.refresh_interval: 30s
index.translog.flush_threshold_ops: 50000
etc_sysconfig_elasticsearch ALL NODES
ES_HEAP_SIZE=12g
ES_STARTUP_SLEEP_TIME=5
etc_sysconfig_graylog-server ALL NODES
JAVA=/usr/bin/java
GRAYLOG_SERVER_JAVA_OPTS="-Xms6g -Xmx6g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow"
GRAYLOG_SERVER_ARGS=""
GRAYLOG_COMMAND_WRAPPER=""
Thread dump of node 558b0d80 / graylog2.example.com.br NOT PROCESSING NOW