What architecture we follow to store 2TB logs per day in graylog for high scalability and cost optimisation?

My concern is to store 2tb logs per day and i want high scalability and cost optimize solution.

so what architecture we follow to store huge data. and what are the system requirements to meet the solution.

Thanks in advance.
All suggestions are welcome.

At that scale you will likely want to use features from GL Operations or GL Cloud.
Talk to Sales and they can advise based on experience with similarly sized installations.

1 Like

It is possible to build that in Opensource, Graylog scales out as does MongoDB (Used for Graylog settings) and OpenSearch (Data Storage). You can laodbalance the whole thing in front of nginix or haproxy (for example) and run all the systems on virtual machines and/or containers (Docker) to optimize your cost assuming that 2TB and the retention of that fits in your virtual/container environ.

The drawback with something the size you are talking about is all the standard Opensource stuff:

  • No access to licensed features - Archiving may be of particular interest to you
  • You and your team are 100% support, along with this community
  • Opensource builds don’t adhere to build standards and conventions though you can get close
1 Like

2 TB/day comes at a price. You will not be able to manage that amount with a Pi, and also two will not be enough.
The Opensource version is perfectly capable of managing that amount of logs, but you will lack the support or professional service from Graylog. The community will be there and try to help.

How long do you plan to keep your data? Optimizing the storage will be a high prio in your case.


@ihe For 15-20 days.

I recommend a hot, warm and cold architecture. Hot can be the ssd, warm Disks and cold tape. Tape have a good price and can hold a few terabyte but its slow.

The whole Graylog and Open Search stack can scale horizontally petty well. You’ll need to get familiar with a spreadsheet and possibly R or similar.

You have 2TB/day nominal data inbound. Is it bursty or steady? Across a day that is a steady stream of 24MBs-1 which is about 200Mbs-1 so switched gigabit ethernet is implied for the inputs. That can flex five times per Graylog node. So that is the first bit evaluated.

Next you need a Graylog node or cluster. This is where it gets complicated. What are you doing with the logs? Are you simply storing them or lots of processing? They don’t need much fast storage but bear in mind that they will buffer when ES or OS is down. That’s a great feature, so make sure you don’t starve them of disc. Let’s say you need six hours for a major upgrade of your ES/OS store. Provided you use Sidecars, they will buffer locally and then your GL cluster will start buffering too. Let’s start off with two nodes with 200GB SAS grade storage each. They will need something like four GB RAM and four vCPUs/threads each. Make sure you set the JVM settings effectively. I’m sticking a finger up in the wind here!

After your Graylog ingesters and processors you need your Elastic or Open Search cluster. This is also a bit of an art form. Decide how you want to actually shard the thing first and configure that in. For your data, I suggest (with minimal working) a five node cluster. The data will need some reasonably fast storage. A flash fronted SAN with multiple 10Gb interfaces for iSCSI or decent FC or similar is indicated. Each node might start off with say six GB RAM and four vCPUs/threads. MLOCk the RAM - set the JVM to use four GB and leave the rest for cache. There are loads of docs on both ES and OS about tuning nodes - do pay attention 8)

In the end you need to work it out for your self. You do have to science the heck out of it for your own use case. I’ve given some very rough indicators based on what I do with it and my requirements but your’s will be different.

Good luck.


Thanks @gerdesj For the reply.

I’d like to add a few things to the words of @gerdesj:

  • There is a bit Docu on how to build a cluster: Architectural Considerations I have clusters like this under my responsibility. Stick to that model! Don’t mix Opensearch/Elastic Nodes with processing Nodes of Graylog! Graylog scales horizontally very well, and it depends a lot on your inputs how much CPU you will burn. The less processing you want to do the better. If you have stuff, which needs a lot of parsing, you will need much more CPU, as regex and Groks are expensive. If you have “only” Netflow you will have a lot of data, but much less work from your CPU.
  • you might think about hot/warm storage on the Opensearch/Elastic side. This is nothing which can be configured in Graylog, you need to play around with that DB on it’s own
  • better run multiple nodes with less ressources, than only a few very big. Java has some strange things happening when you assign to much heap. A node with 256GB Ram will waste most of it, better go for nodes with 32GB Ram each and assign 16 to Openseach or 20-25 to Graylog.

Thanks @ihe for the reply.
I have designed the same arch as you mention.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.