I want to start using graylog for production in a small environment with about 5 Gb/d log volume.
I tend to start with a single-server installation, but: What is the maximum log volume, you would handle without using a cluster installation?
For me, it would be easier to install a single server with a fast NVMe storage, a high performance CPU and a lot of ram, than taking the complexity of a cluster.
I highly suggest for a production environment you have fault redundancy. If you depend on 1 logging server and it goes down or an issue occurs, you have loss. And depending on how much loss you have would be how you rely upon this 1 Logging server. This could possibly have a critical impact not only for yourself but perhaps a customer.
With that being said, the specification for my dev/ops Graylog server is:
Operating system: CentOS 7
CPU: 14 cores
Ram: 14GB
Storage: 1 TB
Logs per day: 30GB +
Inputs: GELF TCP/TLS, Raw Plain/Text, Syslog UDP, NetFlow
Log retention: 30 Days / delete
Log rotation: 1 Day
Source Count: 50 nodes
Devices: Linux, Windows Server 2019, Cisco routers/Switches, Fortinet Firewall, Dell Force 10 Switches, etc…
Perhaps this documentation would also enlighten you for expanding a single Graylog server to a cluster if need be.