The log is too large to cause blockage, no output is written

First introduce my environment:

system info CPUs Mem Disk
rocky 8.8 16 32G 500G

5 graylog nodes

Here is a snippet of the graylog configuration file(docker-compose.yml):

    environment:
      - GRAYLOG_SERVER_JAVA_OPTS=-Xms16g -Xmx16g -XX:NewRatio=1 -XX:MaxMetaspaceSize=8G -server -XX:+ResizeTLAB  -XX:-OmitStackTraceInFastThrow
      - GRAYLOG_SKIP_PREFLIGHT_CHECKS=true
      - GRAYLOG_PASSWORD_SECRET=${PASSWORD_SECRET}
      - GRAYLOG_ROOT_PASSWORD_SHA2=${ROOT_PASSWORD_SHA2}
      - GRAYLOG_IS_MASTER=${IS_MASTER}
      - GRAYLOG_TRUSTED_PROXIES=${GRAYLOG_domain}/32
      - GRAYLOG_NGINX_HOST=${GRAYLOG_domain}
      - GRAYLOG_HTTP_EXTERNAL_URI=http://${node_ip}:9000/
      - GRAYLOG_HTTP_PUBLISH_URI=http://${node_ip}:9000/
      - GRAYLOG_WEB_ENDPOINT_URI=http://${node_ip}:9000/api
      - GRAYLOG_WEB_ENABLE=true      
      - GRAYLOG_REST_TRANSPORT_URI=https://${GRAYLOG_domain}:9000/api/
      - GRAYLOG_ELASTICSEARCH_VERSION=7
      - GRAYLOG_MONGODB_URI=mongodb://${mg_graylog_user}:${mg_graylog_pass}@mongodb_01:27017,mongodb_02:27017,mongodb_03:27017/graylog?replicaSet=messpush0
      - GRAYLOG_ELASTICSEARCH_HOSTS=http://${es_graylog_user}:${es_graylog_pass}@es01:9200,http://${es_graylog_user}:${es_graylog_pass}@es02:9200,http://${es_graylog_user}:${es_graylog_pass}@es03:9200,http://${es_graylog_user}:${es_graylog_pass}@es04:9200,http://${es_graylog_user}:${es_graylog_pass}@es05:9200
      - GRAYLOG_ELASTICSEARCH_DISCOVERY_ENABLED=false
      - GRAYLOG_ELASTICSEARCH_REQUEST_TIMEOUT=2m
      - GRAYLOG_ELASTICSEARCH_INDEX_OPTIMIZATION_JOBS=50
      - GRAYLOG_HTTP_ENABLE_GZIP=true
      - GRAYLOG_ELASTICSEARCH_USE_EXPECT_CONTINUE=true
      - GRAYLOG_ELASTICSEARCH_DISABLE_VERSION_CHECK=false
      - GRAYLOG_ALLOW_HIGHLIGHTING=false
      - GRAYLOG_ELASTICSEARCH_INDEX_OPTIMIZATION_TIMEOUT=1h
      - GRAYLOG_OUTPUT_BATCH_SIZE=10000
      - GRAYLOG_OUTPUT_FLUSH_INTERVAL=15
      - GRAYLOG_OUTPUTBUFFER_PROCESSORS=6
      - GRAYLOG_PROCESSBUFFER_PROCESSORS=8
      - GRAYLOG_OUTPUTBUFFER_PROCESSOR_KEEP_ALIVE_TIME=3000
      - GRAYLOG_OUTPUTBUFFER_PROCESSOR_THREADS_CORE_POOL_SIZE=2
      - GRAYLOG_OUTPUTBUFFER_PROCESSOR_THREADS_MAX_POOL_SIZE=10
      - GRAYLOG_RING_SIZE=524288 # 2^18=262144,2^19=524288; 2^20=1048576
      - GRAYLOG_INPUTBUFFER_RING_SIZE=262144
      - GRAYLOG_INPUTBUFFER_PROCESSORS=2
      - GRAYLOG_INPUTBUFFER_WAIT_STRATEGY=yielding
      - GRAYLOG_PROCESSOR_WAIT_STRATEGY=blocking
      - GRAYLOG_OUTPUT_FAULT_COUNT_THRESHOLD=5
      - GRAYLOG_OUTPUT_FAULT_PENALTY_SECONDS=15
      - GRAYLOG_MESSAGE_JOURNAL_ENABLED=true
      - GRAYLOG_MESSAGE_JOURNAL_MAX_AGE=8h 
      - GRAYLOG_MESSAGE_JOURNAL_MAX_SIZE=300gb
      - GRAYLOG_MESSAGE_JOURNAL_FLUSH_INTERVAL=250000
      - GRAYLOG_LB_RECOGNITION_PERIOD_SECONDS=0
      - GRAYLOG_LB_THROTTLE_THRESHOLD_PERCENTAGE=90
      - prometheus_exporter_enabled=true
      - prometheus_exporter_bind_address=0.0.0.0:9833

My log volume is very large, and there will often be congestion (logs will not be written to ES, In 0/Out 0 msg/s), in this case I can only restart the graylog container;

gra

Occasionally there will be hundreds of millions of unprocessed messages, but out is always 0msg/s. Occasionally there will be Load balancer indication: DEAD and THROTTLED

I wonder what is the reason for this happening? Is there a better solution?

please click on the details for one node. Which buffers are filled up? I quess input is fine, Processing is full. If output is full as well I think your elastic/opensearch has the problem. If the output is empty something blocks your processing - could e.g. be a regex looping.

thanks @ihe

input is always good, the pressure is on process and output.

I guess the picture above doesn’t fully illustrate the problem I’m having, but it’s still a problem. Both process and output are full

If the output is full your Opensearch/Elasticsearch has trouble. You gave quite a lot details about your Graylog, can you also provide some details about your Opensearch/Elasticsearch environment?

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.