Exctractors Check


(Baudringhien) #1

Hi everybody,
I’m currently trying to customise my extractors on graylog
Same extractors are needed by all the inputs from that node
Because of a different log format, my regex expression is the following : logid=("[^"]+"|[^\s]+)
As you can see, there is a “OR” within so it’s using a “long” time to process my messages

Issue : I always have some messages unprocessed in my journal so sometimes they are written in /var/lib/graylog-server/journal/* when there are so much logs and then happen a memory issue because of my /var full
I think, I can’t optimize my regex because of the different log format, I can’t guess if one attrbitut will be with “…” around or without
So, I would like to know how can I check if all the extractors are usefull or not … I can search with the “not _exist:xxx” on graylog but it’s gonna take a long time, I have more than 200 extractors
Thanks :slight_smile:


(Jan Doberstein) #2

your usecase looks like something that can be done better with the processing pipelines as you can decide very granular when what regex should inspect what message. In addition multiple stages for information extraction can be used easily.

In your case it looks like all your 200 extractors run on all incoming messages. At least on all messages that use the input.

You can always check the metrics and look which extractor is taking the longest time, but as you already seen this will consume lot of time.


(Baudringhien) #3

Thank you for the answer,

Yeah, my regex is already like that, trying to extract only if a specific field is detected in the message. But i guess that also this check take itself a bit of time ?

I check out in Inputs/Manage Extractor/Details and I found something like “321 hits, 0 misses” or “0 hits, 321 misses” for a particular extractors. So, i guess than “0 hits” are the one whom never matched with one filed and can be deleting right ?

Thanks again


(Jan Doberstein) #4

So, i guess than “0 hits” are the one whom never matched with one filed and can be deleting right ?

Maybe - as I did not know your setup, your extractors and the intention when that processing was configured it might be. Yes.

Yeah, my regex is already like that, trying to extract only if a specific field is detected in the message. But i guess that also this check take itself a bit of time ?

The processing pipeline has even more features as you can act because of content of another field to run extractor. Not only from the same field as with the extractors.


(system) #5

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.