Messages getting queued, high load on es cluster

jyipks · March 23, 2020, 3:55pm

I am running on graylog 3.1.2 and elasticsearch cluster 6.8.3 in kubernetes in aws. The data are mounted on SC1 EBS.
The specs are as follows:
Master only (3x) : 2GB 1 CPU - 1GB Heap
Data and ingest (2x) : 24GB 6 CPU - 12GB Heap
Graylog (1x) 12GB 6CPU - 6GB Heap

On a good day, Graylog seems to be pushing 7-12k/s messages.
However after a few days of this, it drops to 1k for every few seconds to a minutes, and my messages start to queue up.

On elasticsearch side (utilizing cerebro for stats), cluster is green, but load on one of the nodes is really high. Heap is not maxed, cpu and ram are barely past 5%. Restarting pod doesnt do anything, however, if i shutdown the node, and a brand new ec2 instance comes up in its place, each of the data nodes are start fresh, i hit the 12k/s again.

Any ideas ?

macko003 · March 23, 2020, 4:07pm

I can’t tell exactly, just some things to check:

have 2-3% of your raw data in memory for ES.
if you have lot messages in queue, check the GL’s node processor and output buffer
(if your processor buffer is full, but output is not, you have problem with processing (GL needs more resource, or less tasks; if your output buffer is full, and it cause your processor also, you have problem with your ES cluster.)
I saw, if one ES node have problem, it can cause problem in the cluster. does a restart solve it?
I have problems with ES hierarchy, I’m not a specialist, but I don’t think you need more master nodes than the data nodes. Maybe try to add one more data node.

system · April 6, 2020, 4:18pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog outgoing traffic to ES increased by 2x in 5min Graylog Central (peer support)	2	851	October 18, 2019
Graylog CPU spiked to 500% and ES seems slow Graylog Central (peer support)	6	1327	March 18, 2019
Troubleshooting periodic low performance Graylog Central (peer support)	6	774	December 5, 2018
Graylog failing to index data Graylog Central (peer support)	13	1415	May 25, 2021
Graylog cluster in kubernetes Graylog Central (peer support)	27	3159	May 4, 2024

Messages getting queued, high load on es cluster

Related topics