And again, Graylog does not output anything

I have two thoughts on this, but I doubt they will be the solution:

  1. You might be victim of a regexplot (Regexploit: DoS-able Regular Expressions · Doyensec's Blog). But I doubt this would happen with your two greedy * in your Regex. Those will basically occupy a threat of your workers for prosessing and slow everything down, until there is no one left. Here the Processbuffer filled up.
  2. do you use archiving? We hat the problem that archiving occupied all available connections from Graylog to Elastic. Then no more messages could be spit out to Elastic and the output buffer filled up.

Thing is it’s worked fine for years like this, it’s only recently it started acting up. The fun thing is, of course, that “nothing has changed” (the irony in that statement left as exercise to the reader :stuck_out_tongue: ) - I’m hoping the feature I put in for timeouts on rules etc. will also solve the issue in the sense that we can then pinpoint exactly what’s cooking - or not cooking, for that matter :smiley:

1 Like

If nothing changed on graylog server, maybe something changed on the sender side? Is there something sending f*cked up messages that break your processing?

I would definitely give the pipeline metrics a shot, i don’t see any other useful approach to your problem except try & error.

1 Like

Oh, BTW: I just stumbled over Strange pipeline timestamp behaviour - @gaetan-craft tries to measure log latency there. Maybe that would be an interesting approach? Measure the time the message needs to be processed? That could be a last step in your pipeline workflow and maybe give a hint towards messages that need more processing time than others.

2 Likes

Awesome, I total forgot about that post :+1:

Interesting! I’d still love to see actual per-rule timing metrics though, I guess that ties in neatly with the timeout feature as well, but… we’ll see :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.