1 input with many extractors performance


At the moment I have only 1 syslog TCP input receiving all logs.
Many extractors are linked to this input.
I wonder if this architecture can handle a huge load of incoming logs.
Because for each log all extractors’s conditions are checked.
Would it be better to have several inputs to dispatch the load ?
For example does 10 inputs with 10 extractors each is better than 1 input with 100 extractors ? The resource consumption is really lower ?
If I have only 1 input, is it multithreaded ? Because I have seen inputbuffer_processors in the server.conf.
I need advices on optimizations concerning Graylog.

(Jochen) #2

Yes, because not every single message has to run through all extractors even if only a fraction would match.


So theorically I would need 1 input for 1 device type.
1 input for linux OS logs, 1 for LDAP logs, 1 for firewall Palo Alto, 1 for firewall Forcepoint…
It is strange from a network point of view because 1 input is binded to 1 port.

(Jochen) #4

Look at it from the other way round: Wouldn’t it be strange to apply the same rules/filters/extractors to all messages, no matter whether they came from a Linux machine, from an LDAP directory, from a firewall appliance, or from a toaster?

There’s a reason why there are 65535 ports in the TCP and UDP protocols…

(Jan Doberstein) #5

or you run just one input, but do the processing in the processing pipeline rules …


Thank you.
I will make some load tests in my lab to check the difference between 1 input and several inputs.
It just seems strange to me because I am often using QRadar or ArcSight and they don’t work like that.
QRadar receives all logs in a single port and it just parses the syslog header to get the hostname and then this hostname is linked to a specific parser. ArcSight acts similarly as QRadar, when a new log source comes in, it associates a specific parser to it, it caches this association and then all following logs coming from this log source enter directly this parser.

@jan : Do you mean split logs in several streams and then apply pipeline rules to these streams in order to parse logs ?
I have read another post where @jochen explains that extractors are really faster than pipeline rules.

(Jan Doberstein) #7

you could build that behavior you already know with Graylog @frantz

Have one single input and a pipeline that routes the messages based on some criteria in a processing pipeline. That could be application, hostname or whatever you like.

How that is organized depends on your needs, could be one pipeline that is connected to “all messages” and routes only the messages in different streams. Those streams than have other pipelines connected that do the parsing.

Or you have one pipeline (or multiple) pipelines that is connected to “all messages”. That pipeline rules are well written (then when condition is always clear and shaped) and the final message is then written to the a stream (if needed) or just put back into all messages.

The processing pipelines are very flexibel and you are able to make it very powerful, but also to make it very complex.3

(system) closed #8

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.