Pipelines vs extractor performance

i am currently using 2.5 which handles high volumes of incoming events ( 15-20K/sec, ~100Tb month ). It collects logs from kubernetes clusters on AWS, from network devices and from variety of different application servers.
I use the following scheme in Message processor configuration: “Message Filter chain -> Pipeline processor”
I have 7 different inputs (for different application server types, using different TCP ports ), I use extractors on each input type, and I use 1 pipeline that users field that created by one of extractors (connected to one specific stream )
I configure separate stream (for almost each application server type ) and i use separate index for each log type that ingested into graylog.

  1. Is dividing log traffic into multiple inputs and multiple streams has some performance penalty ?
  2. i have noticed, that when i create a rule to block certain source, and assign it to pipeline connected to specific stream, ( when some source starts producing logs over 30-40K/sec ) i see that graylog is not able anymore to process messages and they start stacking in local journal on one of cluster nodes, and processor buffer is utilized by 100% . I use rule like this one:
    rule “drop_appsrv1_2”
    $message.source == “appsrv1” OR $message.source == “appsrv2”
    so the second question is there more efficient (in terms of performance) to block messages from specific sources ?


1 Like
  1. the penalty only the GL needs to run rules to recognize the stream for the message.
  2. When you run pipelines you need CPU, not much, but you need. In my case I have 4 GL servers with 8-8 cores, and 50% CPU utilization when it processing 40-50k log/s. But when I enable a pipeline, I don’t see a big difference before and after.
    But if you want to drop all messages from a host, I don’t suggest to use pipelines. Use firewall., it doesn’t need as much CPU.

Hi, thank you for the reply.
In most cases i can`t use firewall rules to drop messages,
as many of them are coming from the same host, but from the different service/pod (in case of logs from kubernetes clusters )

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.