Help with under-performing setup

I’m running Graylog 5.2 with Elasticsearch 7.10 and MongoDB 5.0.13 in Docker Compose, on a single node.

I’m receiving data from Filebeat that’s running in three separate Kubernetes environments.

With two of the three running, all is fine. When the 3rd one is added, the platform cannot keep up.

In the input setup, I’m using between 8 and 32 threads to handle the traffic. Each input is handled by a dedicated stream, so I have three inputs/streams. And each stream is using a dedicated index set.

My Elasticsearch is just one node, and the 3 indices have 16 shards each.Storage is a GP3 volume (3,000 IOPS).

The node where Graylog, Elasticsearch & MongoDB are running is a 4-core, 16GB EC2 instance on AWS (m5.xlarge type). Graylog takes about 8G of memory and ES takes about 5. The 4 cores are 100% busy almost constantly.

What can I change to help it cope? When the 3rd cluster gets in, the journal exceeds 100,000 in about 3-4 minutes. The rate of incoming messages with all three clusters active can top 3,000/sec at times, and averages about 1,000/sec. With just two clusters running, the average is about 500/sec

I would very much like to squeeze every ounce of performance I can from this instance before I upgrade to one with double the cost.

Thanks for any clues, I’m new to this.

  • George

What does your total “output” per day look like on the system>overview page?

We set it up yesterday around 1pm, with the two clusters only, and it finished the day with 84GB output.

Today with two clusters plus about one hour with three clusters, it’s 96GB in 14 hours.

I also see alerts about the journal going over the limit.

Okay… Ya you are I see why you are having issues, that infra is vastly undersized for that volume of traffic

Take a look at this video on our reference architectures, these are “safe” architectures so you can get away with less, but within reason. Also you will see that Graylog is build to scale out not up. https://youtu.be/agdLrDw9JaE?si=zrCCdqoTKRDZG9_G

Your elastic is probably suffering the most, but all the pieces except maybe mongo need more nodes.

1 Like

Thanks for the video, I watched it. It recommends very large systems, too large dare say, not in my opinion but in my experience. This small single-node setup I have is handling 127GB of volume today (two out of three clusters),and the video recommends a larger setup for 10-20GB volume.

Are there any rules of thumb for the number of worker threads, or the number of index shards, or the input buffers, or the stream/input setup, for optimizing performance?

I’m already working on a switch to OpenSearch since Elasticsearch is on its way out of Graylog. I’ll use a node for OS and a node for Graylog. We are only now starting with Graylog so we might as well forget about ES right from the start.

Also, I’m getting parsing errors for timestamp fields, can this be handled when the input is from Filebeat?

How long are you planning on retaining the data for?

Depending on the cluster, between one and six months. The largest volume (production cluster) will stay the longest.

Oh and you said you have 3 index sets with 16 shards in each index, correct? What is the reason for the 16 shards, and what is your rotation strategy on the index sets?

16 shards to have some parallelism in input processing. Do you think it’s too many?

Rotation, I want data older than X months to be deleted. Not sure just yet how this can be achieved.

Ya that many shards will kill you. Shards is the largest user or ram, and on a single node there are very few times that having more than one makes sense, they are built to spread the load, but you have no where to spread to. That number of shards is using almost 3GB of RAM for each index rotation. How many shards should I have in my Elasticsearch cluster? | Elastic Blog

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.