i am currently using 2.5 which handles high volumes of incoming events ( 15-20K/sec, ~100Tb month ). It collects logs from kubernetes clusters on AWS, from network devices and from variety of different application servers.
I use the following scheme in Message processor configuration: “Message Filter chain -> Pipeline processor”
I have 7 different inputs (for different application server types, using different TCP ports ), I use extractors on each input type, and I use 1 pipeline that users field that created by one of extractors (connected to one specific stream )
I configure separate stream (for almost each application server type ) and i use separate index for each log type that ingested into graylog.
- Is dividing log traffic into multiple inputs and multiple streams has some performance penalty ?
- i have noticed, that when i create a rule to block certain source, and assign it to pipeline connected to specific stream, ( when some source starts producing logs over 30-40K/sec ) i see that graylog is not able anymore to process messages and they start stacking in local journal on one of cluster nodes, and processor buffer is utilized by 100% . I use rule like this one:
$message.source == “appsrv1” OR $message.source == “appsrv2”
so the second question is there more efficient (in terms of performance) to block messages from specific sources ?