Journal growth too much when I enable an extractor

Le-DOC · February 23, 2021, 12:29pm

Hello,
When I enable an extractor to split mod_security logs, After 1 or 2 hours, my journal grow up too much, and can’t consume all logs.
I already have a lot of extractors from years ago without problem, But this one crash Graylog

I added RAM to graylog2 process (2Go of XmX right now), but same problem.

NNothing on logs.I have to restart graylog, and disable extractor to resume.
The extractor is a grok pattern:

[%{HTTPDERROR_DATE:timestamp}] [(%{WORD:module}|):%{LOGLEVEL:loglevel}] [pid %{POSINT:pid}(:tid %{POSINT:tid}|)]( (%{POSINT:proxy_errorcode})%{DATA:proxy_errormessage}:)?( [client %{IPORHOST:client}:%{POSINT:clientport}])?( [client %{IPORHOST:client2}]) %{GREEDYDATA:alerte} [file “%{PATH:rule_file}”] [line “%{POSINT:line}”] [id “%{POSINT:id}”] [msg “%{GREEDYDATA:message}”] [data “%{IP:ip_server}”] [severity “%{LOGLEVEL:loglevel2}”] [ver “%{DATA:version_crs}”]( [tag “%{DATA:tag1}”]|)( [tag “%{DATA:tag2}”]|)( [tag “%{DATA:tag3}”]|)( [tag “%{DATA:tag4}”]|)( [tag “%{DATA:tag5}”]|)( [tag “%{DATA:tag6}”]|)( [tag “%{DATA:tag7}”]|)( [tag “%{DATA:tag8}”]|)( [tag “%{DATA:tag9}”]|)( [tag “%{DATA:tag10}”]|) [hostname “%{IP:ip_server2}”] [uri “%{DATA:uri}”] [unique_id “%{DATA:unique_id}”]

I’ve tryed to send logs direct in json, to have field splited, but I use filebeat (and logstash output) to send logs, and don’t know how to configure it.

tmacgbay · February 23, 2021, 1:15pm

Here is an earlier post about GROK performance with a solution that seems related:

There are a couple things that you can do to increase performance such as using NOTSPACE instead of DATA (if possible) and strictly defining the beginning (^) and end ($) of your GROK

Le-DOC · February 23, 2021, 3:43pm

I double check the journal usage:
1 hour after I done your advices, with start, and end regex, and NOTSPACE, the Process buffer start to raise (cpu 100% with graylog process)
Once Process buffer up to 65536 (100%), Journal start to growth

From 30 min ago, I have only 4700 msg in this input.
I don’t understand why there is a bottleneck

tmacgbay · February 23, 2021, 3:59pm

I have also had GROK hang a processor buffer- on the NODE page choose more actions then get processor-buffer dump. Normally I see those as idle but when I was getting system hangs it would show the message it was hung up on…

Le-DOC · February 23, 2021, 4:04pm

Ok, I will test this thursday. Thanks

Le-DOC · February 25, 2021, 10:02am

OK I’ve got problematic message. If I try this with extractor simulate, It bug.
I should rebuild extractor.
I hope this will be ok.

Le-DOC · February 26, 2021, 12:50pm

This is ok, But I had to split it in 3 extractors :(.

Thanks for tips

tmacgbay · February 26, 2021, 1:03pm

You could do the work in a pipeline rather than extractor - pipelines would allow you to narrow where the work is done a little better.

Le-DOC · March 4, 2021, 2:30pm

Indeed, with pipeline, we have more granularity.
Thanks

system · March 18, 2021, 2:30pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Process buffer gets full with the Grok pattern Extractor Graylog Central (peer support)	2	1308	April 9, 2019
Extractor trouble Graylog Central (peer support)	1	865	October 2, 2017
Extractor causes low out message performance Graylog Central (peer support) grok-patternspl	6	358	November 28, 2023
Messages stop being read from disk journal Graylog Central (peer support)	13	894	July 12, 2022
Reasons to graylog extractor stop working Graylog Central (peer support)	11	2808	April 25, 2017

Journal growth too much when I enable an extractor

Related topics