Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!
1. Describe your incident:
There is enough disk space available but messages are not flowing out. (Out = 0)
I’m using running “3 instances of graylog-server 2.4.7” managed by load balancer and “4 instances of ElasticSearch”. This setup is running on AWS E2.
For one of the ElasticSerach instance disk utilisation became 100% and execution that instance stopped. But other 3 ElasticSerach are still running with enough disk space available for reach. But Still logs are not flowing out.
2. Describe your environment:
- OS Information:
ubuntu-xenial-16.04-amd64-server
RAM: 32GB
Disk space for each ElasticSearch Instance = 900GB
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 2699.355
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4600.11
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-3
-
Package Version:
graylog-server 2.4.7
elastic-search v5.3.0 -
Service logs, configurations, and environment variables:
Graylog server.conf
allow_highlighting = true
allow_leading_wildcard_searches = true
elasticsearch_max_number_of_indices = 20
elasticsearch_max_time_per_index = 1d
elasticsearch_replicas = 1
is_master = true
message_journal_dir = /var/lib/graylog-server/journal_0
message_journal_max_age = 48h
message_journal_max_size = 25gb
processbuffer_processors = 18
outputbuffer_processors = 14
output_batch_size = 2000
elasticsearch_index_prefix = graylog2
ElasticSearch elasticsearch.yml
cluster.name: graylog2
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.high: “0.95”
cluster.routing.allocation.disk.watermark.low: “0.9”
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.unicast.hosts:
http.cors.allow-origin: “*”
http.cors.enabled: true
http.enabled: true
network.host: “0.0.0.0”
node.data: true
node.ingest: true
node.master: true
node.name: orch-gl-elasticsearch-prod-1-graylog2
path.data: /var/elasticsearch/log/data
path.logs: /var/log/elasticsearch/graylog2
3. What steps have you already taken to try and solve the problem?
Tried restarting the gralog-server and elastic-search instances, the messages started flowing out for some time. Then processes_buffer and output_buffer got full again and messages stopped out.
Manually deleted some old elastic search indexes from System → Indices Tab, To free-up some disk space. This had no effect.
4. How can the community help?
Q1. I wonder why one of the ElasticSerach got utilised more that other 3, Shouldn’t the load be divided on them equally? Can you please look at my ElasticSerach config, if there is any mistake.
Q2. Please also take a look at server.conf, especially processbuffer_processors, outputbuffer_processors values. Apparently these values are set assuming RAM size 32GB. Is this correct? are we distributing RAM here or no. of cores? (which are 4).
Q3. What is the effect of setting “is_master = true” for all 3 graylog-server instances?
Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]