Hi all is there a way to configure a stream not to send data to ES index and use Graylog as aggregator like fluentd?
We are doing a POC on sampling log data and We want to send data to multiple destinations after processing for example to splunk ,crateDB, ES and s3 but currently because of the underlying single ES dependency to all the streams we are seeing data drop and huge latencies seeing logs in splunk as the journal is getting filled quickly, we tried increasing the Journal buffers but that is adding more latency in seeing logs in splunk or s3.
What are we trying to do?
In our setup the Data being ingested to graylog directly is roughly around 70TB but we only want to send around 4.5TB to Splunk via pipelines after processing and around 10TB to ES and all the 70TB to S3.
Graylog is able to buffer all the incoming logs but most of the data is getting dropped because we have only dedicated 10TB resources to ES per DC.
Below is detailed description of our POC setup:
We are running Graylog in three DC’s…Our current setup Per DC with heavy TCP tunings.
Graylog (7 masters 33 agents) Deafult journal Buffers running as Containers on k8’s
Instance Details All nodes are running on I-Flavor(Designed for Data workloads) with 16CPU and 64GB RAM
Operating System Centos 7.2
Willing to contribute back if its a viable solution:
We are in process of implementing custom stream plugin and need guidance or suggestions if we can use Graylog as aggregator and ETL tool without single ES dependency and also Is implementing custom stream plugin the right approach or are there any other alternatives?
“just forward” - no it is not possible, but I think it could be a workaround.
You do a new index set, with little size based indices. If you use this index set the GL will store the messages, but only a few mins/hours. Maybe if you set ES to don’t index the index set, don’t store replica, etc…, it could be add some performance for this forward index set.
You can increase the output_batch_size parameter, and increase the ES http max size. Change it parallel, it can cause problems the wrong sizes.
If you have time, we collect the big clusters’ information. As I see you handle a lot of data, so you can tell something new for us.
If you would like, we can ask an admin to reopen the topic.