Process and output buffers are full

YiYine · November 6, 2020, 10:57am

Hi,
I’m new in the Graylog family. I run a test server. The message rate have been increased a couple days ago. The consequences are that the process and output buffer constantly full.

I have searched about my issue and find some changes that can be made. I did but the buffers are still full.

Here are my specs :

VM with 4 vCPUs
8GB RAM
150GB disk

I changed some values :

Elasticsearch conf :

max heap size : 2GB

Graylog conf :

max heap size : 2GB (it never use more than 1GB)
output_batch_size = 2000
outputbuffer_processors = 6
processbuffer_processors = 6

But this is not helping.

If I get the heap values for Elasticsearch :

root@graylog:~# curl -sS -XGET “localhost:9200/_cat/nodes?h=heap*&v”
heap.current heap.percent heap.max
563.7mb 27 1.9gb

Input message rate is 600/s.

All my extractors are GROK patterns.
IO stats seems not be the problem.
I get “Allocation Failure” from the Elasticsearch logs

Here is the logs :
Graylog (since last reboot) : 2020-11-06T11:11:12.834+01:00 INFO [CmdLineTool] Loaded plugin: AWS plugins 3.3 - Pastebin.com
Elasticsearch : [2020-11-06T10:22:37.085+0000][496848][gc,age ] GC(511) - age 4: 584 - Pastebin.com

Hope that I give all the informations you need.
Thanks in advance for your help !

YiYine · November 6, 2020, 12:26pm

Hi,

Since this morning and all the changes above, I have around 350-450k unprocessed messages.

Hope this helps,
Thanks !

zoulja · November 6, 2020, 1:13pm

What’s your extractors metrics? Check whether your extractors take 100 seconds per message.
Performance tuning with Graylog and Elastic is always full of magic.
Check CPU utilization, check all indices are green, try to reduce batch size

YiYine · November 6, 2020, 1:40pm

Thanks for your quick reply.

Where can I find extractors metrics ?
CPU utilization is around 70-80%.
All indices are green.

While searching for extractors metrics, I just saw this :

For the record, I did several services reboot (conf changes)

All the remaning messages are with the same error.

Thanks

zoulja · November 6, 2020, 2:00pm

Go to Inputs and check specific Extractors, make sure none of them is too heavy.
From the screenshot elastic had some issue(most probably because of performance, it’s a golden standard for this Java thing) but now restored.
About this part

outputbuffer_processors = 6
processbuffer_processors = 6

Graylog folks usually say these parameters must not be more than number of cores(even from this forum we can find it’s not always true, lol).
You have 12 buffer processors but just 4 vCPU and part of them occupied by Elastic.
Damn, that looks bad!

YiYine · November 6, 2020, 2:10pm

Here some metrics from extractors.
These are for firewall logs.

Default configuration for these processors are 3 for output buffer and 5 for process buffer.
Should I test like this ? :

outputbuffer_processors = 2
processbuffer_processors = 2

Cheers

zoulja · November 6, 2020, 2:22pm

In general your metrics aren’t so bad, but maximum values (like 22.000) seem a little bit suspicious.
I would try to optimize pattern.
Example, try to modify
srcip=%{IP:srcip}
to
^srcip=%{IP:srcip}$

And if you use only IPv4 then you can replace IP grok pattern, which includes IPv6 also.
Always check metrics after modifications.

Yes, 2x2 seems more reasonable(if we decide to believe Graylog team members), maybe you even can test 1x1,2x1,1x2

balves7 · November 6, 2020, 7:33pm

I think, that problem is very common using Graylog. Today I have this same problem and I had readed a lot of topics here and I can’t resolve this issue.
I’m following this topic.

YiYine · November 10, 2020, 9:09am

Hi,

Thanks for your input.

I tried
srcip=%{IP:srcip}
to
srcip=%{IPV4:srcip}

Did it on all of my GROK patterns, worked.

I will test the processors modification later and comeback with the resultats.

Thanks

YiYine · November 12, 2020, 10:29am

Hi @zouljan

I did test the processors with 2x2, 1x2, 2x1 but this is not working either. It’s worse actually.
I got back with 6x6. But I can see right now that I have over 1.5 million messages.

Is someone have experience with one Graylog VM about how many one instance can ingest per second with my specs?
It seems that this is too much.

EDIT : no index failures since two days

Thanks !

frantz · November 12, 2020, 1:28pm

To ingest more data you need more CPU. With 4 CPU I would not expect to ingest more than 400 logs per seconds (of course it depends on logs, extractors, pipeline, alerts…).
What is the CPU load average ? Give us the 1 minute average, 5 minutes and 15 minutes.

YiYine · November 12, 2020, 1:41pm

LOAD 4-core

1 min: 10.66
5 min: 10.95
15 min: 10.66

Well, I’ll try to increase CPUs.
Thanks for your input !

YiYine · November 12, 2020, 1:54pm

You may need this information then :
I have also like 15 GROK extractors, all with one field for my received syslog messages.
I’m guessing the message is passing trought all extractors.
I cannot merge them because I receive differents types of messages (firewall).

For example, a few of my extractors :

rule=%{DATA:rule} %{GREEDYDATA:UNWANTED}

srcip=%{IPV4:srcip}

destport=%{NUMBER:destport}

All extractors are processing one field in my message. Fields that I’m interrested in.

May be there is too much extractors and I need to find a way to merge them all ?

Thanks

frantz · November 12, 2020, 1:56pm

Yes you get the issue: processes require 10 CPU but you have only 4.

zoulja · November 12, 2020, 7:35pm

I was in this situation before.
One uncareful regex and your Graylog server is fucked up and you see how your journal is growing,
Client-side parsing is my weapon of choice now.
Filebeat can replace 99.9% of Extractors logic in my case.

YiYine · November 13, 2020, 7:37am

Unfortunately, can’t play with Filebeat.

I just dropped all this huge load and unprocessing messages by setting the Message Processors Configuration to its default… I did changed a the order Pipeline Processor and Message Filter Chain in order to start filtering with streams but it seems that its too heavy…
Now I need to find a solution with pipelines for filtering… but that will be on another thread eventually

Thanks for helping people !

YiYine · November 16, 2020, 9:35am

Well,

Looks like this change solved this issue temporarily… (2M messages unprocessed)
If someone have suggestions, I’ll take it !

I can see that my message rate per second is 327.

And also some iowait… between 30 and 40%.

For the record, VM is 8core with 8GB RAM.

Thanks

zoulja · November 16, 2020, 10:34am

In this case you should always start from graylog and elastic logs.
Is there some error there?
Usually growing journal means graylog can’t write messages into ES

YiYine · November 16, 2020, 4:58pm

Indeed I checked.
Same output from Elasticsearch since this thread begun : [2020-11-06T10:22:37.085+0000][496848][gc,age ] GC(511) - age 4: 584 - Pastebin.com
This is the same pattern, the same kind.
No errors so far on the graylog log.

Thanks for your help.

system · November 30, 2020, 4:58pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Status Green, all systems go. How to optimize? Graylog Central (peer support)	16	4963	September 8, 2017
Performance advice. I'm missing something Graylog Central (peer support) sidecar , nxlog , nodatanx	22	9232	February 22, 2019
Process buffer repeatedly filling up until restart Graylog Central (peer support) pipeline-rules	9	2791	December 24, 2019
Process Buffer - Output Buffer Full Graylog Central (peer support)	5	6723	September 22, 2020
Process Buffer Flooding 100% process Graylog Central (peer support)	8	4729	May 7, 2020

Process and output buffers are full

Related topics