Hi,
I’m new in the Graylog family. I run a test server. The message rate have been increased a couple days ago. The consequences are that the process and output buffer constantly full.
I have searched about my issue and find some changes that can be made. I did but the buffers are still full.
What’s your extractors metrics? Check whether your extractors take 100 seconds per message.
Performance tuning with Graylog and Elastic is always full of magic.
Check CPU utilization, check all indices are green, try to reduce batch size
Go to Inputs and check specific Extractors, make sure none of them is too heavy.
From the screenshot elastic had some issue(most probably because of performance, it’s a golden standard for this Java thing) but now restored.
About this part
Graylog folks usually say these parameters must not be more than number of cores(even from this forum we can find it’s not always true, lol).
You have 12 buffer processors but just 4 vCPU and part of them occupied by Elastic.
Damn, that looks bad!
In general your metrics aren’t so bad, but maximum values (like 22.000) seem a little bit suspicious.
I would try to optimize pattern.
Example, try to modify srcip=%{IP:srcip}
to ^srcip=%{IP:srcip}$
And if you use only IPv4 then you can replace IP grok pattern, which includes IPv6 also.
Always check metrics after modifications.
Yes, 2x2 seems more reasonable(if we decide to believe Graylog team members), maybe you even can test 1x1,2x1,1x2
I think, that problem is very common using Graylog. Today I have this same problem and I had readed a lot of topics here and I can’t resolve this issue.
I’m following this topic.
I did test the processors with 2x2, 1x2, 2x1 but this is not working either. It’s worse actually.
I got back with 6x6. But I can see right now that I have over 1.5 million messages.
Is someone have experience with one Graylog VM about how many one instance can ingest per second with my specs?
It seems that this is too much.
To ingest more data you need more CPU. With 4 CPU I would not expect to ingest more than 400 logs per seconds (of course it depends on logs, extractors, pipeline, alerts…).
What is the CPU load average ? Give us the 1 minute average, 5 minutes and 15 minutes.
You may need this information then :
I have also like 15 GROK extractors, all with one field for my received syslog messages.
I’m guessing the message is passing trought all extractors.
I cannot merge them because I receive differents types of messages (firewall).
For example, a few of my extractors :
rule=%{DATA:rule} %{GREEDYDATA:UNWANTED}
srcip=%{IPV4:srcip}
destport=%{NUMBER:destport}
All extractors are processing one field in my message. Fields that I’m interrested in.
May be there is too much extractors and I need to find a way to merge them all ?
I was in this situation before.
One uncareful regex and your Graylog server is fucked up and you see how your journal is growing,
Client-side parsing is my weapon of choice now.
Filebeat can replace 99.9% of Extractors logic in my case.
I just dropped all this huge load and unprocessing messages by setting the Message Processors Configuration to its default… I did changed a the order Pipeline Processor and Message Filter Chain in order to start filtering with streams but it seems that its too heavy…
Now I need to find a solution with pipelines for filtering… but that will be on another thread eventually
In this case you should always start from graylog and elastic logs.
Is there some error there?
Usually growing journal means graylog can’t write messages into ES