Hello Everyone ,
Right now, I’m in charge of a large amount of log data being handled by a Graylog system, and I’ve been having performance problems. Any tips or thoughts from the community concerning how to improve my setup would be greatly appreciated.
Context:
Environment: Graylog 4.2 is being run on a three-node cluster.
Elasticsearch: 3 nodes clustered together, version 7.10.2.
MongoDB: one node, version 4.4.
Log Volume: Every day, about 500 GB of logs are consumed.
Hardware: 32GB RAM, 8 vCPUs, and SSD storage are included in every Graylog node. Elasticsearch nodes are built to comparable specs.
Problems:
Search Performance: Results from searches are coming back slowly, sometimes taking minutes.
Indexing Delays: After being ingested, logs take a while to show up in the Graylog interface.
High CPU utilisation: All nodes, especially the Graylog nodes, continuously display high CPU utilisation.
What I’ve Attempting:
- larger JVM heap sizes for Elasticsearch and Graylog.
- Modified the shard/replica counts and refresh intervals for the Elasticsearch index.
- indexed rotation and retention tactics were put into place to control disc utilisation.
- In order to manage spikes in log data, input/output filters in Graylog were enabled and configured.
Inquiries:
Are there any particular setups or recommended procedures for managing substantial log collections in Graylog which I may be overlooking?
Would expanding the Elasticsearch or Graylog clusters be beneficial, and if so, how should one go about doing it?
Are there any metrics or monitoring tools that I should pay particular attention to in order to gain a better understanding of the inefficiencies in my setup?
I also checked this https://community.graylog.org/t/graylog-journal-getting-full/tableau
Thank you in advance for your help and suggestions.