Hi, I’ve spend the last few days reading about ELK and Graylog and I have a few questions about Filebeat+Graylog setup that I can not understand from the documentation.
This part of documentation explains how to manage Filebeat via Sidecar Ingest from files
It specifically says
The output module in filebeat is called logstash. It is needed to send messages to a Graylog beats input.
If I understand this correctly, it implies Graylog is able to ingest data send directly from Filebeat? And then Graylog is able to ingest this data to Elasticsearch w/o the need of any other 3rd party components?
But then I read documentation about Filebeat (Filebeat overview | Filebeat Reference [8.11] | Elastic) and other articles on the internet, and they pretty much claims that Filebeat can send data either to Elasticsearch directly (presumably bad idea) or to Logstash (that implies that Logstash needs to be installed and configured alongside the Graylog server) because Filebeat does not speak GELF (which is required for Graylog)
It does not seem that Logstash configuration is/can be managed via Graylog UI. It also does not seem that Logstash is distributed with Graylog. So, basically, Logstash needs to be managed on its own in addition to Graylog.
To summarize I have the following questions that I hope community can help me with:
Is it possible to use Filebeat directly with Graylog without Logstash?
If so, would that mean that Filebeat sends data directly to Elastic or to Graylog, and Greylog sends it to Elastic?
Logstash is often mentioned in terms of:
3.1. logs aggregation and transformation. What Logstash can that Graylog cannot?
3.2. Faulttolerance. Logstash can be used to buffer data on disk to absorb spikes and temporary unavailability of Elastic. Does Graylog server provides any level of fault tolerance for data ingestion on its own?
If I can use Filebeat directly with Graylog, what are the reasons to use Logstash? What Logstash can, that Graylog cannot?
Also, I wonder if anything related to Filebeat/Logstash have changes in recent Graylog versions? For example, maybe Graylog were not able to receives input directly from Filebeat in v3, but this support was added in v4 and that deprecated the need for Logstash?
For most beats, the logstash output is to send the messages to Graylog. For a beat, it makes no difference what receives the signals as long as it follows the protocol. The relatively new options to make use of a queue system are not (yet) implemented in Graylog so that the TCP input is the only option.
implying that beats can be directly processed by Graylog but, at the same time, pointing out that no queuing system was implemented in v3. I wonder if that has changed since then?
You have multiply choice to send and ingest logs/messages this wowuld depend on your environment.
With graylog you need to use one of many INPUTS ( i.e., GELF, Syslog, etc…)
I have seen others use Logstash but they direct it to one of Graylog’s INPUTS created so i dont see any use for LOgstash with Graylog. With Elasticsearch you can send data directly to Elasticsearch, in the Beats shipper configuration file it has setting to auto create dashboads and Index sets. As you know AWS forked Elasticsearch and create OpenSearch. But you can not do this with Graylog, Well you can hack it but end having issues.
You can use Graylog -Sidecar this is a rapper for Winlogbeat/Filebeat and you can control the remote shippers from the dashboard. Kind of like Anisble.
Yes, in the means of clustering. since Graylog is using Opensearch AKA Elasticsearch fork. You have setting on how many shards created , retention, etc. If you using Graylog, again there really is no need to use Logstash.
If you want to use Logstash you might want to look up the history why it was made for Elasticsearch. Also I would not look at Elastic’s Documnetation, I would reference Graylog Documentation here…
Hi Smith, thanks so much for your answers. I’m trying to make sense of this statement in your comment
What I mean to ask is when we use Filebeat, they require to provide address:port in the configuration to send logs. What would be on the of ther side of this add:port? Would it be one of the Graylog components? Is it Elastic/OpenSearch? Does it have to be Logstash?
Yes, in the means of clustering. since Graylog is using Opensearch AKA Elasticsearch fork. You have setting on how many shards created , retention, etc. If you using Graylog, again there really is no need to use Logstash.
What I mean is fault tolerance of an individual instance that receives logs? For example, If I have a spike in logs sent from Filebeats, can Graylog handle that in a way that it won’t run out of RAM trying to buffer / store everything in memory? Logstash, I believe, first offloads add received logs to a disk, and then process it. SO, even it crashes during processing of a long entry, it won’t loose it until it’s send down the pipeline.
If you want to use Logstash you might want to look up the history why it was made for Elasticsearch.
I don’t. But a lot of documentation / articles sounds like I have to, to make it a production ready.
Graylog is configured to have a Journal and this is a directory. This is where all logs go to get indexed, found here
You dont have to but if your reading documentation’s from Elastic and/or Elasticsearch/Opensearch they use it because it easier to ingest/modify logs, where as Graylog does this through INPUT’s on the Web UI.