Process and Output buffer are Full. None of the messages are flowing out

ashu99099 · June 16, 2022, 12:28pm

Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!

1. Describe your incident:
There is enough disk space available but messages are not flowing out. (Out = 0)
I’m using running “3 instances of graylog-server 2.4.7” managed by load balancer and “4 instances of ElasticSearch”. This setup is running on AWS E2.
For one of the ElasticSerach instance disk utilisation became 100% and execution that instance stopped. But other 3 ElasticSerach are still running with enough disk space available for reach. But Still logs are not flowing out.

2. Describe your environment:

OS Information:
ubuntu-xenial-16.04-amd64-server
RAM: 32GB
Disk space for each ElasticSearch Instance = 900GB

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 2699.355
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4600.11
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-3

Package Version:
graylog-server 2.4.7
elastic-search v5.3.0
Service logs, configurations, and environment variables:
Graylog server.conf
allow_highlighting = true
allow_leading_wildcard_searches = true
elasticsearch_max_number_of_indices = 20
elasticsearch_max_time_per_index = 1d
elasticsearch_replicas = 1
is_master = true
message_journal_dir = /var/lib/graylog-server/journal_0
message_journal_max_age = 48h
message_journal_max_size = 25gb
processbuffer_processors = 18
outputbuffer_processors = 14
output_batch_size = 2000
elasticsearch_index_prefix = graylog2

ElasticSearch elasticsearch.yml
cluster.name: graylog2
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.high: “0.95”
cluster.routing.allocation.disk.watermark.low: “0.9”
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.unicast.hosts:
http.cors.allow-origin: “*”
http.cors.enabled: true
http.enabled: true
network.host: “0.0.0.0”
node.data: true
node.ingest: true
node.master: true
node.name: orch-gl-elasticsearch-prod-1-graylog2
path.data: /var/elasticsearch/log/data
path.logs: /var/log/elasticsearch/graylog2

3. What steps have you already taken to try and solve the problem?
Tried restarting the gralog-server and elastic-search instances, the messages started flowing out for some time. Then processes_buffer and output_buffer got full again and messages stopped out.
Manually deleted some old elastic search indexes from System → Indices Tab, To free-up some disk space. This had no effect.

4. How can the community help?
Q1. I wonder why one of the ElasticSerach got utilised more that other 3, Shouldn’t the load be divided on them equally? Can you please look at my ElasticSerach config, if there is any mistake.

Q2. Please also take a look at server.conf, especially processbuffer_processors, outputbuffer_processors values. Apparently these values are set assuming RAM size 32GB. Is this correct? are we distributing RAM here or no. of cores? (which are 4).

Q3. What is the effect of setting “is_master = true” for all 3 graylog-server instances?

Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]

gsmith · June 16, 2022, 9:59pm

Hello && Welcome

After looking over you configuration for a cluster setup it seam incomplete.
I see your using a very old version of GL 2.4.7 So I’m assume your also using a very old version of ES probably below 6.0, is this correct
Questioned: Was this running before or is this a new installation?

I’ll try to answer your question in order. Also I will post some concerns in graylog configuration.

If running a three node ES cluster graylog should be configure something like this depending on the version of ES.

# List of Elasticsearch hosts Graylog should connect to.
# Need to be specified as a comma-separated list of valid URIs for the http ports of your elasticsearch nodes.
# If one or more of your elasticsearch hosts require authentication, include the credentials in each node URI that
# requires authentication.
elasticsearch_hosts = http://node1:9200,http://node2:9200,http://node3:9200,

If one of your elasticsearch nodes goes down or has issues Graylog can use the other ES node/s. This will insure redundancy and as for the configuration something like this in your ES configuration file.

cluster.name: graylog
network.host: 102.200.6.95
http.port: 9200
node.name: lab-elastic-001.domain.com
node.master: true
node.data: true
discovery.zen.ping.unicast.hosts: ["102.200.6.95","102.200.6.96","102.200.6.97"]
discovery.zen.minimum_master_nodes: 2  (3 Master Nodes Use 2,  or just use 1 for 2 master nodes)

For what was show above looks like you have 4 CPU cores, But you configuration Graylog

Also, I do not see inputbuffer_processor? I’m assuming you left that as default which I think is 2
Example, If my Graylog server has 14 CPU cores, My configuration should look like this.

processbuffer_processors = 7
outputbuffer_processors = 3
inputbuffer_processors = 2

Those add up to 12 while leave 2 CPU core for my OS. A rule is there should shouldn’t be more threads then CPU cores. From your configurations made, this GL node should have 34 CPU’s pre node.

In version 4.3 its called now is_leader, anything under 4.3 its called is_master,

The definition to this setting is the same.

NOTES:

When enable this configuration be aware it will use more resources so insure these cluster nodes have enough RAM and CPU.

Insure all the configuration on Elasticsearch and graylog are correct. When dealing with multi nodes try to refrain from using 0.0.0.0 and set each node with a static IP address and in the configuration file are using IP address assigned to them. Insure the FQDN or IP address is reachable over the network, check your /etc/hosts file and hostname.
Check any firewalls and security features are not blocking the require ports need for this cluster to function.

You may want to look over this.

Multi-node Setup

As for the older version you may want to look at this documentation

Welcome to the Graylog documentation — Graylog 2.4.6 documentation

Hope that helps

ashu99099 · June 17, 2022, 8:28am

Thanks for helping out,

yes, elastic-search v5.3.0

yes, It was working fine for couple of months. There was one problem. The data retention policy was not being followed. ES is not deleting and indexes which should be deleted based on retention policy defined. So eventually Disk will run out of space. So I had to manually delete the older indices from UI’s System → Indices tab. Is this issue with Graylog or ES?

Q. For gralylog-server and ES, how many cores, disk space, RAM would you recommend?
My current config:
ES Instances: RAM = 32GB, DIsk=900GB, Cores=4
GrayLog: RAM = 32GB, Cores=4
There are 3 Graylog Instances and 4 ES instances.
How much Load do you think this setup can handle?

gsmith · June 17, 2022, 9:25pm

Not sure, We would need for information and statistics to find out.

This situation happen to one of our costumers running GL 2.4, ES 5, MongoDb 3.x on CentOS 7.
Over 150 nodes using Syslog UDP about 3-4 GB a day. We also had to deleted indices.
GL server has 4 Cores, 4 GB ram and 500 GB HDD. It does not have Extractors or Pipelines.

Since this was a ES issue I reset my buffer setting back to default because someone created to many thread for a server with 4 cores also were not pushing the resources on this node.

processbuffer_processors = 5
outputbuffer_processors = 3
inputbuffer_processors = 2

As for your setup I’m not sure. I would look into Elasticsearch logs, perhaps you can find a clue on what’s going one.

Sum it up:
If GL is ingesting a couple GB’s a day this setup is capable of doing what you need it to. Be aware the version of ES you have does have bugs in it ( I look this up already). If you want to keep you old setup then I would suggest upgrading to GL 2.5 so you can use ES 6.0. this may help and/or increasing CPU cores, I think that is one of a couple issues.

Find out more here.

Upgrading Graylog — Graylog 2.5.0 documentation

ihe · June 22, 2022, 12:49pm

Elastic-Nodes with one node full and others empty looks to me like an incomplete cluster-setup.
On the elastic nodes to to /etc/elasticsearch/elasticsearch.yml
check

cluster.name is the same
cluster.initial_master_nodes contains the list of the master nodes
discovery.zen.ping.unicast.hosts contains the list of elastic-hosts
make sure that you have at least one master-node as in the line with node.roles specified

system · July 6, 2022, 12:50pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog Cluster, Buffer process 100% stop process messages Graylog Central (peer support)	22	17071	November 28, 2018
Struggling with Graylog stopping to export to Elasticsearch Graylog Central (peer support) pipeline-rules , debuggingpl	14	2069	August 5, 2021
Graylog output will stop Graylog Central (peer support)	10	447	September 14, 2023
Journal utilization is too high - process buffer 100% Graylog Central (peer support) alert , elastic	20	6113	April 11, 2022
Graylog nodes stop outputting/fill up buffers Graylog Central (peer support)	15	6215	May 6, 2020

Process and Output buffer are Full. None of the messages are flowing out

Related topics