What is a baseline for system requirements? When searching forms or googling this piacular subject. I find some generic answer. Oh, it does require much. Just 4 GB of ram and a multicore CPU. Which isn’t what I am looking for. I’m starting to deploy Graylog. It will be receiving logs from our firewall, 2 other network devices and a couple of servers to start with. I have Graylog installed on an Ubuntu server VM. I have 10 cores, 40 GB of ram configured to it (I already know how to calculate the storage for it). Furthermore, I hover around 35% to 60% CPU usage and 82% ram usage. This is in an enterprise environment. So I’m starting to wonder if this is a good baseline or if I should up it more.
Hello @InfoSecUniversity,
As a starting point, understanding daily ingest in GB and data retention requirements is important for planning appropriate architecture. Do you have that info to hand?
Daily is about ~66 gb ingest and retention is 180 days.
If you are looking to maintain 180 days of live data then it would roughly work out to 15TB, 66 (daily ingest) x 180 (retention) x 1.3 (overhead). In an ideal scenario the cluster would be comprised of 3 Opensearch nodes each with 5TB, 16 core, 32GB RAM and 2 Graylog nodes each with 16 cores, 16GB RAM. This setup would also offer redundancy should you wish to add replicas.
You may already be doing this but the cluster could be made to better fit your available resources if you were snapshotting indices with Opensearch to a remote repo and then removing them from the OS nodes hot storage as a kind of archiving. It might take some scripting. This way the Opensearch nodes wouldn’t need to store and manage so much data.
Thank you for the response. At this time, we don’t have the extra hardware to do what you had mentioned, at least with the clustering portion of it (to do it correctly). Though I would love to do it that way. But as you mentioned with the cold archive. Is it possible to put the archives into a backblaze b2 bucket?
Your setup will be fine too, just keep an eye on the % utilisation on your process and output buffers.
Unsure about Backblaze as I’ve not tried it myself but a quick search turns up some hopeful results.