Graylog stops processing logs at the same time every day

isclever · February 27, 2019, 3:28pm

I have a single combined Graylog 3.0.0-12 / Elasticsearch 6.6.0 single instance (4vcpu/8GB ram) (Virtualized on Proxmox) where at 10am EST/1500UTC every day Graylog stops processing messages with no error message.

Both Graylog and Elasticsearch have been given 2GB of heap, monitoring shows no memory or heap exhaustion.

At 10am (EST) a single Graylog thread will start spinning at 100% cpu and the process buffer will start filling up, some messages are processed but once the buffer fills up no new messages are written to Elasticsearch. Also once the process buffer is filled up a second Graylog thread starts spinning at 100% cpu.

No log entries (at Debug level) are in Graylog’s server.log, nothing to note in Elasticsearch logs as well.

Data is a few servers’s audit logs in one injest with some Grok matching rules.

Restarting Graylog will allow the messages to be processed (with no apparent dropped messages).

Any insight into further troubleshooting steps or resolving this issue?

Thanks.

macko003 · February 28, 2019, 9:21am

check you graylog’s log. (when the daily tasks run)
also check your GL server’s clock, and time settings.
and the buffers’ state.
check the incomeing messages number.

GL does some tasks at the night, eg rotate indices, marge the old, etc, and in this time it stops the processing for a little time. In my systems GL does it UTC 0:00, but it doesn’t match with 10am EST.
Based on your config, it some other problem, it could take more time.
or the old index’s marge slows down your ES, and it can accept only some messages from graylog.
Or one of your clients send a lot of messages, what GL’s can’t handle.

isclever · February 28, 2019, 3:12pm

So it happened again today. There is nothing in /var/log/graylog-server/server.log, the last log line is from yesterday.

I do notice something, the traffic IS being processed, both my ins and outs seem to be equal, at least equal enough I don’t notice a different. Output stops when the process buffer fills (likely because nothing new comes in).

Here is a graph showing messages in/out (they overlap) and the rise in the process buffer usage:

So unless a few messages are getting stuck, is there any way to see what exactly is in the process buffer?

jan · March 1, 2019, 9:12am

turn the page - what does your Elasticsearch do when Graylog can’t process the message?

Check if that has threads available or print anything in the log. It is more likely that Graylog can’t ingest to Elasticsearch that it shows this behaviour.

Jan

macko003 · March 1, 2019, 12:20pm

what I can’t understand…
On the graph, under 6 min your process buffer goes 0 to 18-19k.
But (if the in and out really equal, because on the graph I see only the out’s color) under 6 min you got only 6*250 msgs, so it’s only 1,5k. 18k with 200 msg/s takes one and a half hour.

I think if the ES can’t ingest enough message, the in should be higher then out, and the out should be higher and maybe a constant line. (but of course an ES check can’t hurt.)

First I suggest to “certify” your graph, and double check really contains the valid numbers. (Compare with GL’s WUI many different time)

system · March 15, 2019, 12:21pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Process buffer repeatedly filling up until restart Graylog Central (peer support) pipeline-rules	9	2784	December 24, 2019
Graylog stops processing messages seemingly at random times Graylog Central (peer support)	7	1906	June 19, 2020
Unprocessed messages Graylog Central (peer support)	9	2187	January 8, 2018
Graylog Cluster, Buffer process 100% stop process messages Graylog Central (peer support)	22	17053	November 28, 2018
Graylog nodes stop outputting/fill up buffers Graylog Central (peer support)	15	6203	May 6, 2020

Graylog stops processing logs at the same time every day

Related topics