Self-Host On Prem: If you Could Spec Out the Perfect Graylog Rig

What specs would you target? This isn’t an idle question, I will be presenting the request to the higherups as my current setup is reaching its limits.

We ingest about 20GBs a day in logs. I would likely use a ProxMox host so that I could potentially split OpenSearch and Graylog into two different containers.

So, any suggestions? My preference is using an AMD CPU

Hi @accidentaladmin,

You definitely want to separate GL and OS on their own hosts. Don’t make them share resources.

As for the perfect rig, I wouldn’t overthink it. Pretty much any modern system with appropriate resources assigned will handle the load you’re describing. You can run it on physical servers, in virtual, or in containerized environments.

The key resources to consider are CPU, RAM, Java heap, Storage and I/O. The diagram below gives suggested CPU and RAM recommendations. Storage is determined by how much log data you wish to retain. I/O is not as important at this low ingestion level, but stay away from spinning disks if at all possible. The rule of thumb for Java heap is that it should be half of system RAM, up to 31GB max heap.

Hope this helps. Feel free to ask any follow up questions you may have.

Thank you so much! This helps a great deal.

I am curious, though. Is there a reason for two Opensearch nodes? Also, am I doing my math, correctly?

Opensearch:
Node 1: 8 cores 32 GB
Node 2: 8 cores 32 GB

Graylog:
Server: 8 cores 16GB

So If I wanted one-rig (running proxmox) I’d probably want something with at minimum 24 cores? (or am I conflating cores with threads?)

Thank you!

You may not need two OS nodes. It depends on your retention requirements.

That’s the TL;DR, there is a much longer answer relating to index shards and the heap space assigned to each of them.

If you are interested in the topic of sharding, this Elasticsearch blog post is a great reference. It applies equally to Opensearch.

Again, thank you!

We retain logs for 90days (if that helps)

EDIT: The java heap is handled by the Graylog server and not specifically the OpenSearch servers, correct?

No. Each has their own JVM settings. The exact location depends on the OS and the application. Default file locations are listed here: Default file locations

If your individual indices are small, you should probably reduce the number of shards. As you saw in the link I posted earlier, the size of each shard should be between 20 and 40 GB. So, for a small index, one or two shards should be plenty and will reduce the total number of shards you have to deal with.

From the post:

" TIP: The number of shards you can hold on a node will be proportional to the amount of heap you have available, but there is no fixed limit enforced by Elasticsearch. A good rule-of-thumb is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better. This will generally help the cluster stay in good health."

Thank you. Yes, I read through your link prior to asking and reduced the amount of shards per index from the default using this formula:

App. Number of Primary Shards = 
(Source Data + Room to Grow) * (1 + Indexing Overhead) / Desired Shard Size

I was shooting for 20GB shard sizes and basically came to 1 shard per index for all but 1 or 2 indices. I am already seeing a performance boost.

Good work. That’s exactly what I would recommend.

Good luck!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.