Method to identify a processing issue

urban_moniker · April 3, 2019, 11:09am

Hi.

Fairly generic question. We have a multi-node GL setup that we are currently testing (v3) and we have a strange issue where 1 of the nodes sometime stops processing, the process buffer maxes out, and it just sits there. Restarting GL on that node fixes the issue but then it can reappear randomly in the future.

As we are testing and trying different things I can’t say it is 100% only this node (we have log LB’s that will spread the load though they tend to prefer specific ones) but it seems to be.

Question is, what is the best method to try and narrow down what the cause could be? We monitor the high level metrics (using Telegraf\Grafana etc) - CPU, memory, JVM, in\process\out buffers, journal size etc and all seems to be in order until we get this out of character issue (interestingly the CPU on the node does not max out when it occurs, whereas normally CPU is the bottleneck at high processing load).

Because its a random question not looking for specific advice, but any pointers as to specific sub-metrics to monitor which may help ID specific processes being the cause would be helpful.

Note that we did see this sort of thing quite a lot when we were using the DNS resolver lookup feature but turned it off (we process circa 4k logs\sec at 95th percentile) .

Many Thanks

benvanstaveren · April 3, 2019, 11:23am

As far as I remember, the metrics you want to look at aren’t exportable in the metrics endpoint, but you can go to the Nodes screen in Graylog, then click the Metrics button, and look for any metrics relating to pipelineprocessor.

Wild guess so, your mileage may vary

system · April 17, 2019, 11:37am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Certain Mesages seem to stop Graylognodes Graylog Central (peer support)	17	1298	March 6, 2019
High CPU usage most of the time Graylog Central (peer support)	3	3465	September 7, 2017
Process Buffer Full - How do i fault find? Graylog Central (peer support)	6	4634	June 19, 2020
Oddities with messages out Graylog Central (peer support)	2	310	May 22, 2020
Graylog stops processing logs at the same time every day Graylog Central (peer support)	5	1253	March 15, 2019

Method to identify a processing issue

Related topics