Process and Output buffer are Full. None of the messages are flowing out

Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!

1. Describe your incident:
There is enough disk space available but messages are not flowing out. (Out = 0)
I’m using running “3 instances of graylog-server 2.4.7” managed by load balancer and “4 instances of ElasticSearch”. This setup is running on AWS E2.
For one of the ElasticSerach instance disk utilisation became 100% and execution that instance stopped. But other 3 ElasticSerach are still running with enough disk space available for reach. But Still logs are not flowing out.

2. Describe your environment:

  • OS Information:
    ubuntu-xenial-16.04-amd64-server
    RAM: 32GB
    Disk space for each ElasticSearch Instance = 900GB

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 2699.355
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4600.11
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-3

  • Package Version:
    graylog-server 2.4.7
    elastic-search v5.3.0

  • Service logs, configurations, and environment variables:
    Graylog server.conf
    allow_highlighting = true
    allow_leading_wildcard_searches = true
    elasticsearch_max_number_of_indices = 20
    elasticsearch_max_time_per_index = 1d
    elasticsearch_replicas = 1
    is_master = true
    message_journal_dir = /var/lib/graylog-server/journal_0
    message_journal_max_age = 48h
    message_journal_max_size = 25gb
    processbuffer_processors = 18
    outputbuffer_processors = 14
    output_batch_size = 2000
    elasticsearch_index_prefix = graylog2

ElasticSearch elasticsearch.yml
cluster.name: graylog2
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.high: “0.95”
cluster.routing.allocation.disk.watermark.low: “0.9”
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.unicast.hosts:
http.cors.allow-origin: “*”
http.cors.enabled: true
http.enabled: true
network.host: “0.0.0.0”
node.data: true
node.ingest: true
node.master: true
node.name: orch-gl-elasticsearch-prod-1-graylog2
path.data: /var/elasticsearch/log/data
path.logs: /var/log/elasticsearch/graylog2

3. What steps have you already taken to try and solve the problem?
Tried restarting the gralog-server and elastic-search instances, the messages started flowing out for some time. Then processes_buffer and output_buffer got full again and messages stopped out.
Manually deleted some old elastic search indexes from System → Indices Tab, To free-up some disk space. This had no effect.

4. How can the community help?
Q1. I wonder why one of the ElasticSerach got utilised more that other 3, Shouldn’t the load be divided on them equally? Can you please look at my ElasticSerach config, if there is any mistake.

Q2. Please also take a look at server.conf, especially processbuffer_processors, outputbuffer_processors values. Apparently these values are set assuming RAM size 32GB. Is this correct? are we distributing RAM here or no. of cores? (which are 4).

Q3. What is the effect of setting “is_master = true” for all 3 graylog-server instances?

Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]

Hello && Welcome

After looking over you configuration for a cluster setup it seam incomplete.
I see your using a very old version of GL 2.4.7 So I’m assume your also using a very old version of ES probably below 6.0, is this correct
Questioned: Was this running before or is this a new installation?

I’ll try to answer your question in order. Also I will post some concerns in graylog configuration.

If running a three node ES cluster graylog should be configure something like this depending on the version of ES.

# List of Elasticsearch hosts Graylog should connect to.
# Need to be specified as a comma-separated list of valid URIs for the http ports of your elasticsearch nodes.
# If one or more of your elasticsearch hosts require authentication, include the credentials in each node URI that
# requires authentication.
elasticsearch_hosts = http://node1:9200,http://node2:9200,http://node3:9200,

If one of your elasticsearch nodes goes down or has issues Graylog can use the other ES node/s. This will insure redundancy and as for the configuration something like this in your ES configuration file.

cluster.name: graylog
network.host: 102.200.6.95
http.port: 9200
node.name: lab-elastic-001.domain.com
node.master: true
node.data: true
discovery.zen.ping.unicast.hosts: ["102.200.6.95","102.200.6.96","102.200.6.97"]
discovery.zen.minimum_master_nodes: 2  (3 Master Nodes Use 2,  or just use 1 for 2 master nodes)

For what was show above looks like you have 4 CPU cores, But you configuration Graylog

Also, I do not see inputbuffer_processor? I’m assuming you left that as default which I think is 2
Example, If my Graylog server has 14 CPU cores, My configuration should look like this.

processbuffer_processors = 7
outputbuffer_processors = 3
inputbuffer_processors = 2

Those add up to 12 while leave 2 CPU core for my OS. A rule is there should shouldn’t be more threads then CPU cores. From your configurations made, this GL node should have 34 CPU’s pre node.

In version 4.3 its called now is_leader, anything under 4.3 its called is_master,

The definition to this setting is the same.

NOTES:

When enable this configuration be aware it will use more resources so insure these cluster nodes have enough RAM and CPU.

Insure all the configuration on Elasticsearch and graylog are correct. When dealing with multi nodes try to refrain from using 0.0.0.0 and set each node with a static IP address and in the configuration file are using IP address assigned to them. Insure the FQDN or IP address is reachable over the network, check your /etc/hosts file and hostname.
Check any firewalls and security features are not blocking the require ports need for this cluster to function.

You may want to look over this.

As for the older version you may want to look at this documentation

Hope that helps

1 Like

Thanks for helping out,

yes, elastic-search v5.3.0

yes, It was working fine for couple of months. There was one problem. The data retention policy was not being followed. ES is not deleting and indexes which should be deleted based on retention policy defined. So eventually Disk will run out of space. So I had to manually delete the older indices from UI’s System → Indices tab. Is this issue with Graylog or ES?

Q. For gralylog-server and ES, how many cores, disk space, RAM would you recommend?
My current config:
ES Instances: RAM = 32GB, DIsk=900GB, Cores=4
GrayLog: RAM = 32GB, Cores=4
There are 3 Graylog Instances and 4 ES instances.
How much Load do you think this setup can handle?

Not sure, We would need for information and statistics to find out.

This situation happen to one of our costumers running GL 2.4, ES 5, MongoDb 3.x on CentOS 7.
Over 150 nodes using Syslog UDP about 3-4 GB a day. We also had to deleted indices.
GL server has 4 Cores, 4 GB ram and 500 GB HDD. It does not have Extractors or Pipelines.

Since this was a ES issue I reset my buffer setting back to default because someone created to many thread for a server with 4 cores also were not pushing the resources on this node.

processbuffer_processors = 5
outputbuffer_processors = 3
inputbuffer_processors = 2

As for your setup I’m not sure. I would look into Elasticsearch logs, perhaps you can find a clue on what’s going one.

Sum it up:
If GL is ingesting a couple GB’s a day this setup is capable of doing what you need it to. Be aware the version of ES you have does have bugs in it ( I look this up already). If you want to keep you old setup then I would suggest upgrading to GL 2.5 so you can use ES 6.0. this may help and/or increasing CPU cores, I think that is one of a couple issues.

Find out more here.

2 Likes

Elastic-Nodes with one node full and others empty looks to me like an incomplete cluster-setup.
On the elastic nodes to to /etc/elasticsearch/elasticsearch.yml
check

  • cluster.name is the same
  • cluster.initial_master_nodes contains the list of the master nodes
  • discovery.zen.ping.unicast.hosts contains the list of elastic-hosts
  • make sure that you have at least one master-node as in the line with node.roles specified
3 Likes