The sizing guide doesn’t reflect some optimizations that have been made in both technologies in the last couple of years, but it remains fairly close to what you could expect in a heterogeneous logging environment with parsing and enrichment taking place for each log pipeline.
Unfortunately, logs are complex and therefore tight estimations are difficult. If you have a lot of logs that are similar and parsing those logs are fairly simple (JSON, or KV pair, or something covered by an input) then you could get double, or more traffic through the same environment. However, the inverse is also true, if your logs are complex or you implement a poorly written parser, then your throughput will suffer greatly.
Any idea what sort of logs you’re planning to put into Graylog?
Thanks a lot.
Well our plans are quite on the safe side, as I will start with a small footprint (probably 3x GL and 3x ES but with lower resources as it will run as VMs so adding more resources over time won’t be that difficult) and once proof of concept (different log types) is finalized we will move into phase approach.
Initial phase would be pretty easy (hopefully) as it will be mainly Syslog messages, but in next phase we might start with Windows Events and other system specific logs.
So I was more focused if given guide is recommendation or if there is kind of a “small step” sizing guide. As you can eventually start just with 1xGL and 1xES and than add more while adding them into cluster, of course behind loadbalancer.
Nevertheless very first input is going to be Syslog, but volume of messages might be high. Hopefully Graylog will show utilization/health of it’s infrastructure so sizing can be done overtime… :-/