Journal growth too much when I enable an extractor

When I enable an extractor to split mod_security logs, After 1 or 2 hours, my journal grow up too much, and can’t consume all logs.
I already have a lot of extractors from years ago without problem, But this one crash Graylog :frowning:

I added RAM to graylog2 process (2Go of XmX right now), but same problem.

NNothing on logs.I have to restart graylog, and disable extractor to resume.
The extractor is a grok pattern:

[%{HTTPDERROR_DATE:timestamp}] [(%{WORD:module}|):%{LOGLEVEL:loglevel}] [pid %{POSINT:pid}(:tid %{POSINT:tid}|)]( (%{POSINT:proxy_errorcode})%{DATA:proxy_errormessage}:)?( [client %{IPORHOST:client}:%{POSINT:clientport}])?( [client %{IPORHOST:client2}]) %{GREEDYDATA:alerte} [file β€œ%{PATH:rule_file}”] [line β€œ%{POSINT:line}”] [id β€œ%{POSINT:id}”] [msg β€œ%{GREEDYDATA:message}”] [data β€œ%{IP:ip_server}”] [severity β€œ%{LOGLEVEL:loglevel2}”] [ver β€œ%{DATA:version_crs}”]( [tag β€œ%{DATA:tag1}”]|)( [tag β€œ%{DATA:tag2}”]|)( [tag β€œ%{DATA:tag3}”]|)( [tag β€œ%{DATA:tag4}”]|)( [tag β€œ%{DATA:tag5}”]|)( [tag β€œ%{DATA:tag6}”]|)( [tag β€œ%{DATA:tag7}”]|)( [tag β€œ%{DATA:tag8}”]|)( [tag β€œ%{DATA:tag9}”]|)( [tag β€œ%{DATA:tag10}”]|) [hostname β€œ%{IP:ip_server2}”] [uri β€œ%{DATA:uri}”] [unique_id β€œ%{DATA:unique_id}”]

I’ve tryed to send logs direct in json, to have field splited, but I use filebeat (and logstash output) to send logs, and don’t know how to configure it.

Here is an earlier post about GROK performance with a solution that seems related:

There are a couple things that you can do to increase performance such as using NOTSPACE instead of DATA (if possible) and strictly defining the beginning (^) and end ($) of your GROK

I double check the journal usage:
1 hour after I done your advices, with start, and end regex, and NOTSPACE, the Process buffer start to raise (cpu 100% with graylog process)
Once Process buffer up to 65536 (100%), Journal start to growth :confused:

From 30 min ago, I have only 4700 msg in this input.
I don’t understand why there is a bottleneck :confused:

I have also had GROK hang a processor buffer- on the NODE page choose more actions then get processor-buffer dump. Normally I see those as idle but when I was getting system hangs it would show the message it was hung up on…

Ok, I will test this thursday. Thanks

OK I’ve got problematic message. If I try this with extractor simulate, It bug.
I should rebuild extractor.
I hope this will be ok.

This is ok, But I had to split it in 3 extractors :(.

Thanks for tips

You could do the work in a pipeline rather than extractor - pipelines would allow you to narrow where the work is done a little better.

1 Like

Indeed, with pipeline, we have more granularity.