Graylog high journal utilization is too high

Chase · December 30, 2021, 9:31pm

Another thing that I have noticed is that when searching or going to a dashboard, my out traffic goes to 0 a lot. It makes me think that elastic is processing, or is dedicating it’s processing to the query rather that splitting it to traffic coming in as well.

gsmith · December 30, 2021, 10:51pm

Hello,

I’m back from vacation

I noticed these statements which can be a direct result for resources and configurations made.

This can be from the configuration. made in Graylog config file.

INPUT Buffer configurations doesn’t need a lot of cpu but if you see it climb just add another one

inputbuffer_processors = 2

Normally the Process buffer would have the great number of CPU.

processbuffer_processors= 12

The trick with this is once you increase the number of the process buffer AND restart the Graylog service it takes a couple minutes to kick in. Start with a small number and increase it gradually. Sometimes you wont see results right away.
This depends on how much logs are being received during this time while old messages are trying to be indexed.
If the Output buffer fills up, then you see a increase on the Process buffer. I would increase the number of cpu cores for output buffer , then restart Graylog services and wait a few minutes. If
the issue continues then I would repeat the process over again but you shouldn’t have to increase that number for outputbuffer to much. As a reminder try not to increase those numbers beyond the amount of physical CPU Cores.

outputbuffer_processors = 5

As for this statement

This could be from Date/Time mismatch, Elasticsearch not processing quick enough OR if you are reconfiguring Graylog and making adjustments while restart service/s.

Since your stating about the Buffers filling up I’m leaning more on your configurations and what your allowing Elasticsearch to use.

Arie · December 31, 2021, 8:29am

At first try removeing or hash this one, it could prohibit elastic from starting up if you did not do some other
adjustments in your system.

this is mine on a server ruuning graylog and elasric on the same node:

cluster.name: graylogsrktst
node.name: montst.log.srk
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: <ipadres>
http.port: 9200
discovery.seed_hosts: ["<ipadres>"]
cluster.initial_master_nodes: ["<ipadres>"]
action.auto_create_index: false

Could be that you have to change some parameters in server.conf
elasticsearch_hosts = http://<ipadres>:9200

Chase · January 3, 2022, 3:20pm

Thank you for your reply.

I have tested increasing the amount of processors allocated for both output and the process buffer, but either have seemed to help. I have also tried changing the output batch size from 200 all the way up to 2000 and there was no change.

Currently I’m sitting at 6 processors for processing and another 6 for output, and the same thing seems to happen. I have noticed that if I decrease the amount of traffic that goes in that seems to help, but then I’m not logging all my traffic.

I think it’s important to state, that this did not appear to be a problem for before the log4j update. Since then the output buffer appears to become a problem, and from what I can tell is also causing the issues that we had with the gap in time when searching. NOTE: that if I disable most of my inputs, such that the output buffer can catch up, then that gap appears to recede, if not go away completely.

Chase · January 3, 2022, 3:21pm

At this point, I’m really thinking that it has something to do with elasticsearch, and not being able to process the data from the output buffer fast enough.

Chase · January 3, 2022, 3:31pm

You are saying to comment out the bootstrap.memory_lock:true?

What type of things in the server.conf file are you suggesting that we change?

Arie · January 3, 2022, 3:56pm

Comment the bootstrap.memory_lock:true and see what happens en check your elasticsearch logs for any indication that could gice some clues.

Chase · January 3, 2022, 4:49pm

Made the change, and doesn’t appear to have any affect.

gsmith · January 3, 2022, 11:39pm

Hello,

I completely agree,
Since your buffers showing 100% or there not processing messages fast enough, you will see gaps in your Graphs. This problem can be resolved in the Graylog Config file as I stated above. That is the only way I know how to fix that is by increasing those setting I showed above.
Have you searched the forum for Processors, Buffers full, etc?

Chase · January 4, 2022, 1:48pm

Most everything that I have found said that they needed more resources to elasticsearch. And while that can make sense in certain circumstances, in this case, it doesn’t as it was working for us, up until the upgrade.

At this point, I’m kind of at a loss, and wondering if I should just reinstall, as we only have a couple of months’ worth of data, and have no obligation to keep it. However, I’m not looking forward to rebuilding, and wished there was some way to troubleshoot elasticsearch better.

Any other thoughts as to other options that I can do? Any specific utilities that I could get or look at that would help to determine if and where the problem would be in elasticsearch?

Arie · January 4, 2022, 3:14pm

Best possible thin for ES is to lower the refresh_index.interval.

If one has got 6.000 messages coming in at one single node that has to be indexed immediately than it could be a lot to handle for elastic. Try decreasing is to 5 or even 30 seconds.

I do this with Cerebro and made an template with it so any new index takes it to.

Chase · January 4, 2022, 3:26pm

Are you saying that you make this change on elastic or in the graylog conf file?

gsmith · January 4, 2022, 11:07pm

Hello,
What does the logs show in Elasticsearch and Graylog? Have you tried Tail’ing them? Specially Elasticsearch logs. You mention about an upgrade what actually did you do?

Check out this.

Chase · January 5, 2022, 1:44pm

There was an updated version of graylog that came out about 3 weeks ago, that patched the log4j. We installed that, but haven’t seen any problems listed in the logs.

Chase · January 6, 2022, 2:45pm

I ended up reinstalling. This fixed the problem, and now the output is way up. However, I’m getting a new error. Not sure if I should create a new ticket or continue on with this. The new error is:

“Nodes with too long GC pauses”

I have noticed that where this problem happens it also appears to make the process buffer go up to 100 percent. This in spite the fact that I gave the process buffer 7 cores, and the output buffer (which isn’t have a problem anymore) 6 cores.

Any thoughts on how to resolve this, or where to look?

Thanks,

gsmith · January 6, 2022, 10:51pm

Hello

EDIT: After looking more into this error. Looks like a direct result what @Arie was stating above.

index.refresh_interval: 5s to index.refresh_interval: 30s

 curl -X PUT "localhost:9200/graylog_*/_settings" -H 'Content-Type: application/json' -d'
{
  "index" : {
    "refresh_interval" : "30s"
  }
}
'

You may need to read this post, it has some good idea’s/information

https://graylog2.narkive.com/CRI3N2ju/how-to-fix-nodes-with-too-long-gc-pauses-issues-in-my-cluster

You may need to adjust the Process buffer a little more.
As @Arie stated above your getting around 6.000 messages coming in at one single node. You have tripled what I do in my lab. That’s a lot of messages for one single Graylog server so you need to adjust your resources to accommodate Elasticsearch and Graylog.

gsmith · January 7, 2022, 5:14am

Hello,

@Arie After testing this out in my lab I was able to set refresh_interval to 30s, but after rotating the indices the settings didn’t take on the newly created indices. I used graylog_* which I was hoping that any indices created after I reconfigure the settings would be applied. Any Idea?

Chase · January 7, 2022, 1:36pm

Could this be set to the template? What does changing the refresh interval do?

Chase · January 7, 2022, 4:12pm

When running the status command I get the following

graylog-admin@graylog01:~$ sudo systemctl status elasticsearch.service

● elasticsearch.service - Elasticsearch

Loaded: loaded (/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)

Active: active (running) since Fri 2022-01-07 16:08:35 UTC; 36s ago

Docs: https://www.elastic.co

Main PID: 5604 (java)

Tasks: 189 (limit: 57775)

Memory: 17.1G

CGroup: /system.slice/elasticsearch.service

└─5604 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true>

I have currently set my jvm options set to 16g. I’ve noticed that the memory when running the command has gone up as high as 24g, would be the max my system has. Should the memory value highlighted be the same as what is set in the jvm.options?

Chase · January 7, 2022, 4:14pm

Rather than every 30s could changing it to 5 seconds still help with performance? If so, is this how I would change it?

Topic		Replies	Views
Journal utilization is too high - process buffer 100% Graylog Central (peer support) alert , elastic	20	6137	April 11, 2022
Graylog journal getting full Graylog Central (peer support)	29	13537	September 23, 2019
Struggling with Graylog stopping to export to Elasticsearch Graylog Central (peer support) pipeline-rules , debuggingpl	14	2071	August 5, 2021
65536 messages in process buffer, 100% utilized Graylog Central (peer support)	24	4861	October 7, 2022
Graylog Cluster, Buffer process 100% stop process messages Graylog Central (peer support)	22	17079	November 28, 2018

Graylog high journal utilization is too high

Related topics