Graylog-server container start consuming high cpu and auto recovers in evening

kcgochar · April 28, 2022, 2:01pm

We are using Graylog 3.3.15 and it’s running as docker-compose. From last month every Tuesday 5:30 AM we see high cpu usage on server which auto recovers at evening. This is happening only on every Tuesday.

During high cpu usage we also see below two notfication on Gryalog UI:

Journal utilization is too high and may go over the limit soon. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit
[Journal utilization is too high - Uncommited messages deleted from journal]

We have both graylog and elasticsearch runnign as docker-compose on same node.

gsmith · April 29, 2022, 1:24am

Hello && Welcome @kcgochar

I might be able to shed some light on this issue.

These are common logs which could indicate that every Tuesday 5:30 AM this node is probably receiving more logs then normal and Graylog cannot process them quick enough in which the journal fills up which could cause more issue. This maybe indicating a resource issue.

Most members increase the Process buffers to resolve this but it is suggested and also shown in Graylog Documentation that the Buffers ( process, input, output) should not exceed the physical CPU cores on that node. This means if you have the following

processbuffer_processors = 5
outputbuffer_processors = 3
inputbuffer_processors = 2

Then you should have at least 10 CPU cores. Processors creates a new thread depending on the number set. I also have seen these setting on node with 4 CPU’s, but when there is more data then Graylog an handle , issue arise from this.
If the journal gets to full you could increase the journal size to prevent Elasticsearch going into read mode.

message_journal_max_size = 12gb

Hope that helps

tmacgbay · April 29, 2022, 5:26pm

This also may be an issue with the types of messages and or how you are processing during the peak times. Consider that it may be a regex or GROK statement that is inefficient but fine normally but when a complicated or oddly formatted log comes in, it spikes the system trying to deal with it.

kcgochar · May 4, 2022, 12:49pm

Thank you @tmacgbay. Could you please let me know how can we check if incoming messages are regex of GROK statement ?

kcgochar · May 4, 2022, 1:09pm

thank you @gsmith for contributing…

We have 8 core CPU and already have below settings-
processbuffer_processors = 5
outputbuffer_processors = 3
inputbuffer_processors = 2

We have increased the message_journal_max_size from 5 GB to 12 GB. Now we will check if that helps.

Also, can you please let know how can we check which application stream is writing more logs on each Tuesday same time.

tmacgbay · May 4, 2022, 3:08pm

if YOU are applying regex or GROK to a message in an extractor or in the Processing Pipeline then you can check those for efficiency. If you post a sample message that you are processing through regex/GROK and the associated regex/GROK someone here can examine it and make suggestions…

gsmith · May 5, 2022, 1:18am

Hello @kcgochar

If you can, I would suggest adding 2 more CPU cores. Total would be 10 which matches those settings, plus you need some CPU core for you OS, Just a thought.
This would depend on how much logs are being ingested. If you think about it, all 8 Cores are used by Graylog and Elasticsearch so if there is an issue at 5:30 AM this server might not have resources it needs, so its struggling.

Increase the journal is a safety configuration but if there is not enough resource to index those messages/logs it will still have high CPU usage.

system · May 19, 2022, 1:18am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Journal utilization is too high - process buffer 100% Graylog Central (peer support) alert , elastic	20	6008	April 11, 2022
Graylog journal continuously growing up Graylog Central (peer support)	8	8060	August 9, 2017
Graylog high journal utilization is too high Graylog Central (peer support)	43	11902	January 25, 2022
65536 messages in process buffer, 100% utilized Graylog Central (peer support)	24	4705	October 7, 2022
Journal utilization is too high again Graylog Central (peer support)	7	6077	June 15, 2018

Graylog-server container start consuming high cpu and auto recovers in evening

Related topics