I’m running Graylog 5.2 with Elasticsearch 7.10 and MongoDB 5.0.13 in Docker Compose, on a single node.
I’m receiving data from Filebeat that’s running in three separate Kubernetes environments.
With two of the three running, all is fine. When the 3rd one is added, the platform cannot keep up.
In the input setup, I’m using between 8 and 32 threads to handle the traffic. Each input is handled by a dedicated stream, so I have three inputs/streams. And each stream is using a dedicated index set.
My Elasticsearch is just one node, and the 3 indices have 16 shards each.Storage is a GP3 volume (3,000 IOPS).
The node where Graylog, Elasticsearch & MongoDB are running is a 4-core, 16GB EC2 instance on AWS (m5.xlarge type). Graylog takes about 8G of memory and ES takes about 5. The 4 cores are 100% busy almost constantly.
What can I change to help it cope? When the 3rd cluster gets in, the journal exceeds 100,000 in about 3-4 minutes. The rate of incoming messages with all three clusters active can top 3,000/sec at times, and averages about 1,000/sec. With just two clusters running, the average is about 500/sec
I would very much like to squeeze every ounce of performance I can from this instance before I upgrade to one with double the cost.
Okay… Ya you are I see why you are having issues, that infra is vastly undersized for that volume of traffic
Take a look at this video on our reference architectures, these are “safe” architectures so you can get away with less, but within reason. Also you will see that Graylog is build to scale out not up. https://youtu.be/agdLrDw9JaE?si=zrCCdqoTKRDZG9_G
Your elastic is probably suffering the most, but all the pieces except maybe mongo need more nodes.
Thanks for the video, I watched it. It recommends very large systems, too large dare say, not in my opinion but in my experience. This small single-node setup I have is handling 127GB of volume today (two out of three clusters),and the video recommends a larger setup for 10-20GB volume.
Are there any rules of thumb for the number of worker threads, or the number of index shards, or the input buffers, or the stream/input setup, for optimizing performance?
I’m already working on a switch to OpenSearch since Elasticsearch is on its way out of Graylog. I’ll use a node for OS and a node for Graylog. We are only now starting with Graylog so we might as well forget about ES right from the start.
Also, I’m getting parsing errors for timestamp fields, can this be handled when the input is from Filebeat?
Oh and you said you have 3 index sets with 16 shards in each index, correct? What is the reason for the 16 shards, and what is your rotation strategy on the index sets?
Ya that many shards will kill you. Shards is the largest user or ram, and on a single node there are very few times that having more than one makes sense, they are built to spread the load, but you have no where to spread to. That number of shards is using almost 3GB of RAM for each index rotation. How many shards should I have in my Elasticsearch cluster? | Elastic Blog