Hi all is there a way to configure a stream not to send data to ES index and use Graylog as aggregator like fluentd?
We are doing a POC on sampling log data and We want to send data to multiple destinations after processing for example to splunk ,crateDB, ES and s3 but currently because of the underlying single ES dependency to all the streams we are seeing data drop and huge latencies seeing logs in splunk as the journal is getting filled quickly, we tried increasing the Journal buffers but that is adding more latency in seeing logs in splunk or s3.
What are we trying to do?
In our setup the Data being ingested to graylog directly is roughly around 70TB but we only want to send around 4.5TB to Splunk via pipelines after processing and around 10TB to ES and all the 70TB to S3.
Graylog is able to buffer all the incoming logs but most of the data is getting dropped because we have only dedicated 10TB resources to ES per DC.
Below is detailed description of our POC setup:
We are running Graylog in three DC’s…Our current setup Per DC with heavy TCP tunings.
Graylog (7 masters 33 agents) Deafult journal Buffers running as Containers on k8’s
Mongo (HA 4 Nodes)
Elasticsearch (10 Nodes 1-master,1-balancer,8-Datanodes)
Instance Details All nodes are running on I-Flavor(Designed for Data workloads) with 16CPU and 64GB RAM
Operating System Centos 7.2
Willing to contribute back if its a viable solution:
We are in process of implementing custom stream plugin and need guidance or suggestions if we can use Graylog as aggregator and ETL tool without single ES dependency and also Is implementing custom stream plugin the right approach or are there any other alternatives?
Here is an open issue which i raised which also relates to ES Federation and using Graylog as Aggregator https://github.com/Graylog2/graylog2-server/issues/4199