First introduce my environment:
system info | CPUs | Mem | Disk |
---|---|---|---|
rocky 8.8 | 16 | 32G | 500G |
5 graylog nodes
Here is a snippet of the graylog configuration file(docker-compose.yml):
environment:
- GRAYLOG_SERVER_JAVA_OPTS=-Xms16g -Xmx16g -XX:NewRatio=1 -XX:MaxMetaspaceSize=8G -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow
- GRAYLOG_SKIP_PREFLIGHT_CHECKS=true
- GRAYLOG_PASSWORD_SECRET=${PASSWORD_SECRET}
- GRAYLOG_ROOT_PASSWORD_SHA2=${ROOT_PASSWORD_SHA2}
- GRAYLOG_IS_MASTER=${IS_MASTER}
- GRAYLOG_TRUSTED_PROXIES=${GRAYLOG_domain}/32
- GRAYLOG_NGINX_HOST=${GRAYLOG_domain}
- GRAYLOG_HTTP_EXTERNAL_URI=http://${node_ip}:9000/
- GRAYLOG_HTTP_PUBLISH_URI=http://${node_ip}:9000/
- GRAYLOG_WEB_ENDPOINT_URI=http://${node_ip}:9000/api
- GRAYLOG_WEB_ENABLE=true
- GRAYLOG_REST_TRANSPORT_URI=https://${GRAYLOG_domain}:9000/api/
- GRAYLOG_ELASTICSEARCH_VERSION=7
- GRAYLOG_MONGODB_URI=mongodb://${mg_graylog_user}:${mg_graylog_pass}@mongodb_01:27017,mongodb_02:27017,mongodb_03:27017/graylog?replicaSet=messpush0
- GRAYLOG_ELASTICSEARCH_HOSTS=http://${es_graylog_user}:${es_graylog_pass}@es01:9200,http://${es_graylog_user}:${es_graylog_pass}@es02:9200,http://${es_graylog_user}:${es_graylog_pass}@es03:9200,http://${es_graylog_user}:${es_graylog_pass}@es04:9200,http://${es_graylog_user}:${es_graylog_pass}@es05:9200
- GRAYLOG_ELASTICSEARCH_DISCOVERY_ENABLED=false
- GRAYLOG_ELASTICSEARCH_REQUEST_TIMEOUT=2m
- GRAYLOG_ELASTICSEARCH_INDEX_OPTIMIZATION_JOBS=50
- GRAYLOG_HTTP_ENABLE_GZIP=true
- GRAYLOG_ELASTICSEARCH_USE_EXPECT_CONTINUE=true
- GRAYLOG_ELASTICSEARCH_DISABLE_VERSION_CHECK=false
- GRAYLOG_ALLOW_HIGHLIGHTING=false
- GRAYLOG_ELASTICSEARCH_INDEX_OPTIMIZATION_TIMEOUT=1h
- GRAYLOG_OUTPUT_BATCH_SIZE=10000
- GRAYLOG_OUTPUT_FLUSH_INTERVAL=15
- GRAYLOG_OUTPUTBUFFER_PROCESSORS=6
- GRAYLOG_PROCESSBUFFER_PROCESSORS=8
- GRAYLOG_OUTPUTBUFFER_PROCESSOR_KEEP_ALIVE_TIME=3000
- GRAYLOG_OUTPUTBUFFER_PROCESSOR_THREADS_CORE_POOL_SIZE=2
- GRAYLOG_OUTPUTBUFFER_PROCESSOR_THREADS_MAX_POOL_SIZE=10
- GRAYLOG_RING_SIZE=524288 # 2^18=262144,2^19=524288; 2^20=1048576
- GRAYLOG_INPUTBUFFER_RING_SIZE=262144
- GRAYLOG_INPUTBUFFER_PROCESSORS=2
- GRAYLOG_INPUTBUFFER_WAIT_STRATEGY=yielding
- GRAYLOG_PROCESSOR_WAIT_STRATEGY=blocking
- GRAYLOG_OUTPUT_FAULT_COUNT_THRESHOLD=5
- GRAYLOG_OUTPUT_FAULT_PENALTY_SECONDS=15
- GRAYLOG_MESSAGE_JOURNAL_ENABLED=true
- GRAYLOG_MESSAGE_JOURNAL_MAX_AGE=8h
- GRAYLOG_MESSAGE_JOURNAL_MAX_SIZE=300gb
- GRAYLOG_MESSAGE_JOURNAL_FLUSH_INTERVAL=250000
- GRAYLOG_LB_RECOGNITION_PERIOD_SECONDS=0
- GRAYLOG_LB_THROTTLE_THRESHOLD_PERCENTAGE=90
- prometheus_exporter_enabled=true
- prometheus_exporter_bind_address=0.0.0.0:9833
My log volume is very large, and there will often be congestion (logs will not be written to ES, In 0/Out 0 msg/s
), in this case I can only restart the graylog container;
Occasionally there will be hundreds of millions of unprocessed messages, but out is always
0msg/s
. Occasionally there will beLoad balancer indication: DEAD
andTHROTTLED
I wonder what is the reason for this happening? Is there a better solution?