1 input with many extractors performance

frantz · November 14, 2017, 3:42pm

Hello,
At the moment I have only 1 syslog TCP input receiving all logs.
Many extractors are linked to this input.
I wonder if this architecture can handle a huge load of incoming logs.
Because for each log all extractors’s conditions are checked.
Would it be better to have several inputs to dispatch the load ?
For example does 10 inputs with 10 extractors each is better than 1 input with 100 extractors ? The resource consumption is really lower ?
If I have only 1 input, is it multithreaded ? Because I have seen inputbuffer_processors in the server.conf.
I need advices on optimizations concerning Graylog.

jochen · November 14, 2017, 3:54pm

Yes, because not every single message has to run through all extractors even if only a fraction would match.

frantz · November 14, 2017, 4:06pm

So theorically I would need 1 input for 1 device type.
1 input for linux OS logs, 1 for LDAP logs, 1 for firewall Palo Alto, 1 for firewall Forcepoint…
It is strange from a network point of view because 1 input is binded to 1 port.

jochen · November 14, 2017, 4:43pm

Look at it from the other way round: Wouldn’t it be strange to apply the same rules/filters/extractors to all messages, no matter whether they came from a Linux machine, from an LDAP directory, from a firewall appliance, or from a toaster?

There’s a reason why there are 65535 ports in the TCP and UDP protocols…

jan · November 14, 2017, 5:13pm

or you run just one input, but do the processing in the processing pipeline rules …

frantz · November 15, 2017, 9:20am

Thank you.
I will make some load tests in my lab to check the difference between 1 input and several inputs.
It just seems strange to me because I am often using QRadar or ArcSight and they don’t work like that.
QRadar receives all logs in a single port and it just parses the syslog header to get the hostname and then this hostname is linked to a specific parser. ArcSight acts similarly as QRadar, when a new log source comes in, it associates a specific parser to it, it caches this association and then all following logs coming from this log source enter directly this parser.

@jan : Do you mean split logs in several streams and then apply pipeline rules to these streams in order to parse logs ?
I have read another post where @jochen explains that extractors are really faster than pipeline rules.

jan · November 15, 2017, 9:29am

you could build that behavior you already know with Graylog @frantz

Have one single input and a pipeline that routes the messages based on some criteria in a processing pipeline. That could be application, hostname or whatever you like.

How that is organized depends on your needs, could be one pipeline that is connected to “all messages” and routes only the messages in different streams. Those streams than have other pipelines connected that do the parsing.

Or you have one pipeline (or multiple) pipelines that is connected to “all messages”. That pipeline rules are well written (then when condition is always clear and shaped) and the final message is then written to the a stream (if needed) or just put back into all messages.

The processing pipelines are very flexibel and you are able to make it very powerful, but also to make it very complex.3

system · November 29, 2017, 9:29am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pipelines vs extractor performance Graylog Central (peer support)	3	2597	December 25, 2019
A Question about Extractors and Inputs Graylog Central (peer support)	2	884	September 4, 2017
Extractor per stream Graylog Central (peer support)	2	583	April 10, 2021
One Input port for multiple sources Graylog Central (peer support)	2	3935	March 6, 2017
Efficient way to ingest lots of data and use GROK patterns? Graylog Central (peer support)	9	2304	May 9, 2020

1 input with many extractors performance

Related topics