Pipelines vs extractor performance

nix-power · December 7, 2019, 10:47pm

i am currently using 2.5 which handles high volumes of incoming events ( 15-20K/sec, ~100Tb month ). It collects logs from kubernetes clusters on AWS, from network devices and from variety of different application servers.
I use the following scheme in Message processor configuration: “Message Filter chain -> Pipeline processor”
I have 7 different inputs (for different application server types, using different TCP ports ), I use extractors on each input type, and I use 1 pipeline that users field that created by one of extractors (connected to one specific stream )
I configure separate stream (for almost each application server type ) and i use separate index for each log type that ingested into graylog.

Is dividing log traffic into multiple inputs and multiple streams has some performance penalty ?
i have noticed, that when i create a rule to block certain source, and assign it to pipeline connected to specific stream, ( when some source starts producing logs over 30-40K/sec ) i see that graylog is not able anymore to process messages and they start stacking in local journal on one of cluster nodes, and processor buffer is utilized by 100% . I use rule like this one:
=================
rule “drop_appsrv1_2”
when
$message.source == “appsrv1” OR $message.source == “appsrv2”
then
drop_message();
end
================
so the second question is there more efficient (in terms of performance) to block messages from specific sources ?

Thanks

macko003 · December 11, 2019, 8:20am

the penalty only the GL needs to run rules to recognize the stream for the message.
When you run pipelines you need CPU, not much, but you need. In my case I have 4 GL servers with 8-8 cores, and 50% CPU utilization when it processing 40-50k log/s. But when I enable a pipeline, I don’t see a big difference before and after.
But if you want to drop all messages from a host, I don’t suggest to use pipelines. Use firewall., it doesn’t need as much CPU.

nix-power · December 11, 2019, 8:40am

Hi, thank you for the reply.
In most cases i can`t use firewall rules to drop messages,
as many of them are coming from the same host, but from the different service/pod (in case of logs from kubernetes clusters )

system · December 25, 2019, 8:40am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pipeline performance optimization and metrics Graylog Central (peer support) pipeline-rules , route-to-streampl , grok-patternspl , documentation , architecture	3	256	June 11, 2024
Extractor causes low out message performance Graylog Central (peer support) grok-patternspl	6	358	November 28, 2023
1 input with many extractors performance Graylog Central (peer support) pipeline-rules , route-to-streampl	7	3065	November 29, 2017
Pipelines are filling the process buffer and journal Graylog Central (peer support) pipeline-rules , route-to-streampl , debuggingpl	15	1610	September 19, 2019
Can pipelines be used to redurce incomming traffic? Graylog Central (peer support) pipeline-rules	11	737	January 19, 2023

Pipelines vs extractor performance

Related topics