I’m trying to conceptually understand Graylog a little more deeply. I know that it uses Elasticsearch to allow users to run queries and store its logs, but not much more than that.
Could someone explain this in simple terms?
- If Graylog writes logs to Elasticsearch, does that mean these logs put in a database? If so, where is this database stored? Or I suppose what I’m asking is, what does Elasticsearch do with these logs, and where can I see these logs that have been written to Elasticsearch?
- Maybe an explanation of the process from when Graylog receives logs to the end might help me understand better.
- What role does MongoDB play in Graylog? I know nearly nothing about MongoDB; how does it help Graylog?
- Why might the total disk space used and the amount (in GB) Graylog says it ingests each day be different? Could it have something to do with the way Elasticsearch stores these logs?
Overall, I would just like to understand how Graylog, ES, and MongoDB all work together and how this correlates to the way logs are ultimately stored.
Here is the short - when log messages come in via a Graylog input, Graylog uses the settings it has stored in MongoDB to figure out how to handle the message (Settings like inputs, streams, extractors, pipelines, rules, Alerts … all stored in MongoDB) once it has finished processing the message (breaking out fields and other calculations), it sends it out to the Elasticsearch Database to store the message. Dashboards and queries on the data are sent from Graylog to Elasticsearch/Opensearch database and the results are displayed in the Graylog Gui.
The default install has all these products installed on one machine and all meaasges are stored on the Graylog server in it’s Elasticsearch database instance … @gsmith answered a question earlier about where the Elasticsearch database is stored. For how fast it (Elasticsearch) grows daily, it depends on how many messages get sent and what the size of those messages are… for instance a firewall is going to send a lot more messages if internet usage is up because people are in the office (or whatever) and the size of a message may differ on the even and how much data you are sending in There are a few Windows logs that ship in extra paragraphs that simply explain the same event the same way every message and that part could be dropped to save a small amount of space… depending ont he volume coming in of course.
There is a good thread (Discussion and charts) here that explains how a message flows through Graylog that will help to visualize it.
Ah, okay. So the actual disk space used might differ from the amount Graylog says it’s ingesting each day because Elasticsearch might filter/drop some of them?
Any message modification, (dropping, increase or reduction in size, breaking out of fields…which would increase size, removal of data/fields… which would reduce size, etc…) would happen in Graylog before storage in the Elasticsearch Database. I believe Graylog ingestion size is dependent on the message coming in from the input, not of the message going out to Elasticsearch for storage.