Colleague of mine has subscribed to the Graylog newsletter. These newsletter have a little “Did you know” section at the bottom for technical tips.
This months last tip was this:
And here’s where the ‘Get process-buffer dump’ feature lets you know which message is sending your processing into an endless loop.
I’m not saying that I’m upset, but it does feel little bit like the person who writes these, also lurks in this forum and wanted to throw some shade at me.
We had this happening again yesterday. With the “process-buffer dump” I could get enough information to pull the log event in question from the upstream syslog server. And et voliá testing it against one of the two grok patterns that are working with this line yielded a HTTP 502.
Still though, I’m suprised that this takes down the complete chain. I would expect that the other 3 remaining threads would continue to process messages normally. Of course it would ultimately clog up after much longer time. But we can have a watchdog for this that kills of the thread and gives me a notifcation/log that parsing that messaged failed for that very reason.