Extractors vs. Pipelines - What is the preferred way?


Hi Everyone,

during the evaluation of graylog2 server, I found out that there are different ways of parsing data.

  1. Input -> Extractor -> Stream -> Pipeline
  2. Input -> Pipeline -> Stream

In the first scenario, I use the JSON Extractor to extract data and the stream rules to get messages into a stream.
Afterwards I use the Pipeline Steps to set timestamp to real logtimestamp and to some manipulations.
Configuration order is:
Message Filter Chain
GeoIP Resolver

In the second scenario, the pipeline will be used to put the messages into a stream and extract data and manipulate data.
Configuration order is:
Message Filter Chain
GeoIP Resolver

I read a topic from begin of 2016, that executors should go end of life, is that true?
I thought that in the above examples the first solutions should be faster, because not all messages has to go the whole way through the pipeline.

Are there any recommendations which is the preferred solution?

Kind regards,

(Jochen) #2

That’s the plan, but deprecating and removing extractors depends on a lot of factors, e. g. making the processing pipeline as fast as the extractors and having a more high-level interface for the processing pipeline rules which doesn’t involve writing code.

So for now, it’s safe to keep using extractors.

That’s completely up to you and depends on your use cases.

The message processing pipeline is more powerful than the extractors, but it’s also more complex and still has a performance overhead compared to the extractors.

You can model any extractor in the processing pipeline, but not the other way round.

(system) #3

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.