Status Green, all systems go. How to optimize?

joaociocca · August 21, 2017, 10:41pm

So, after a lot of hiccups, everything is go! Graylog is running smoothly, receiving output from the testing Graylog VM while we (I mean, I) setup things for production, like getting the powershell installer script ready (hint: https://github.com/ion-storm/Graylog_Sysmon/pull/6)

But, yeah, not everything is great. I notice that process buffer is almost all the way full up to 65536 messages, journal sometimes reaches 10-20%…

But the most CPU usage all my 3 Graylog VMs got was 16,15%, and memory never gets higher than 1,8GB of the 4GB of RAM they each have.

Admittedly, I seem to be, once again, at a loss on the documentation. I remember having read something about optimizing, but I can’t seem to find it, and my Google Fu is failing me. For what I remember, all specs are default as of now - so should I start tweaking?

jtkarvo · August 22, 2017, 5:35am

My guess would be that the Elasticsearch cluster is the culprit, and does not index messages as fast as Graylog would give them to it.

If Graylog had a high processor load (but you tell that that is not the case here), then I would look at extractors/pipeline functions.

maniel · August 22, 2017, 5:45am

wouldn’t the output buffer be full then?

jtkarvo · August 22, 2017, 5:49am

Indeed. I wonder what are the settings for number of processbuffer processors. Probably it would be possible to increase their number. For example if you have 10 cores in your VM:s, you could have 8 processbuffer processors and 2 outputbuffer processors…

maniel · August 22, 2017, 5:50am

other threads about 100% process buffer here say it may be fault of some faulty extractor, maybe check your extractor metrics, also maybe processing stuck on some slow lookup tables or pipelines

jtkarvo · August 22, 2017, 6:24am

Indeed. The number of processbuffer processors limits the total CPU load of the Graylog node. Increasing that number it is possible to get a higher load on the node. Then, after finding the proper processor numbers so that the VM processors are fully utilized on a peak load, the next step would be looking at the extractors and optimizing the regexes.

joaociocca · August 22, 2017, 8:55pm

so… I upped the processbuffer_processors from 5 to 75… CPU peaked at 100% and fell down to 15%. We decided to up the game - configured a sidecar on our main system, with history.

131 (master) peaked CPU but seemed to be digesting nicely.

Decided to try other configurations on 132 and 133 slaves - 65 on 132 and 55 on 133. Both are also peaking, but it seems to have stabilized. At first, incoming was about 3k/s, (which seems to have been the history), but now it’s down to 10-20/s, while output is at a peak of 7.5k/s This on 4 CPU VMs. We’re still deciding if we’ll up the CPU count, seems to have been just the history digesting peak and nothing more…

-edit
half an hour later and we have 0 journal usage, process and output buffers don’t even hit 1%. Must really have been history =)

BUT considering the opening of this thread was process buffer 100% and now it’s not hitting even 1%, I’d say upping the processors count did the trick! Thanks @jtkarvo and @maniel! =D

Oh yeah, BTW, I didn’t even touch elasticsearch - neither itself nor it’s VMs. 137 (graylog+mongo uses 131-133, elastic uses 134-138) hit 100% CPU load for less than 10 minutes after the increase of processor count, and returned to normal after that.

jtkarvo · August 23, 2017, 5:21am

That is weird. If you have 4 CPU:s (cores?), then the original 5 processors dedicated to processing messages would sound enough to get to 100% processor utilization… Or do your VM CPU:s have several cores - in that case, the maximum number of processors for processbuffer and outputbuffer probably should not exceed the total number of cores.

And 3k messages per node sounds a bit low throughput; if you hit capacity problems, you should look at your GROK and regex usage and optimize it.

jan · August 23, 2017, 8:07am

This posting might help you: https://www.graylog.org/blog/74-back-to-basics-from-single-server-to-graylog-cluster

joaociocca · August 23, 2017, 6:22pm

I asked around here, we have 4 CPUs on each VM, each CPU with 2 cores. But I still failed to see a significant difference between processbuffer_processor = 5;outputbuffer_processor = 3 and processbuffer_processor = 7;outputbuffer_processor = 1.
server.conf states # The number of parallel running processors.# Raise this number if your buffers are filling up., so I thought this was the number of parallel process running, not actually cores - and it’s what made me test with higher number for CPU load.

Heap sizes have already been tweaked for the RAM set for each VM - 4GB for each Graylog VM, 8 for each Elasticsearch VM.

On the previous scenario, before tweaking processbuffer_processor, ES VMs never got past 10% CPU load - after changing, and after dealing with that log history, they haven’t got past 40% CPU load. Active memory maximum hit 2.3GB of the total 8GB RAM, and 4GB configured on java heap… so I see no reason to add a sixth and seventh nodes.

8h graph for today, so far, gave me peaks of 30k messages/minute

Nodes are doing just fine

2017-08-23_15h20_02

Main node details:

This is from yesterday, when we started ingesting the log history from that nginx, almost hitting 1M/minute::

jtkarvo · August 24, 2017, 9:39am

In general, switching between tasks consumes also resources, so for processing intensive tasks, having a lot of threads on a single core would be counterproductive. For tasks that wait for I/O, having several processes per core would be good. Hard to tell the right amount. 75 processes just sounded high.

Nevertheless, I’m glad this worked out.

joaociocca · August 24, 2017, 11:57am

It actually may as well be too high. While 131 was set to 75, 132 was set to 65 and 133 to 55 - all had the exact same result…

jochen · August 24, 2017, 1:32pm

Just wondering, do you have more than 75 CPU cores in your machine?

joaociocca · August 25, 2017, 12:50am

As I`ve said before…

jochen · August 25, 2017, 8:06am

Then why do you assign so much more process/output buffer processors than you have CPU cores? That’s counterproductive to say the least.

joaociocca · August 25, 2017, 12:15pm

As I’ve said before:

system · September 8, 2017, 12:15pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance advice. I'm missing something Graylog Central (peer support) sidecar , nxlog , nodatanx	22	9255	February 22, 2019
Process Buffer Flooding 100% process Graylog Central (peer support)	8	4749	May 7, 2020
Process and output buffers are full Graylog Central (peer support)	19	9847	November 30, 2020
Processors buffer configuration, process buffer 100% Graylog Central (peer support)	7	12735	June 22, 2018
Explanation of Graylog data processing and performance Graylog Central (peer support)	7	14728	January 24, 2018

Status Green, all systems go. How to optimize?

Related topics