Hourly Backlog Surge and 100% Buffer Maxed Issue in Graylog

oluseyeo · November 7, 2023, 1:38am

Hello everyone,

I’m writing to provide an important update on this ongoing issue. Over the weekend, we had a similar repeat of the issue where we had significant spike in our message backlog, which reached over 48 million messages, with zero messages ingested. My initial attempts to address this issue as has always been, was a service reload, but this provided only temporary relief, as only a few hundreds of thousand of messages were successfully ingested while the issue persisted.

The eureka moment came when I decided to take a second look at the extractors. I swung in and deleted all extractors then reloaded the Graylog service. This brought a dramatic improvement, with the backlog being swiftly cleared in under 10minutes - message processing and ingestion peaked at almost 54k msg/s, as indicated in the screenshot provided.

Now that it has been established that the extractors are the biggest debacle, in a bid to permanently resolve this issue, I’m interested in exploring metrics that can help us assess the performance of each individual extractor. We have approximately 15 extractors in place, and understanding their impact on message processing will be crucial. Your insights and suggestions are highly valued.

Topic		Replies	Views
Graylog Cluster, Buffer process 100% stop process messages Graylog Central (peer support)	22	17098	November 28, 2018
Incoming is good but outgoing is very slow Graylog Central (peer support)	14	3256	October 5, 2018
Backlog on GL Nodes Graylog Central (peer support) pipeline-rules , route-to-streampl	7	1160	July 4, 2019
Unprocessed messages every morning for 2 hours Graylog Central (peer support)	4	1589	August 17, 2017
Logs not coming when backlog exceeded 1M Graylog Central (peer support)	2	436	September 25, 2018

Hourly Backlog Surge and 100% Buffer Maxed Issue in Graylog

Related topics