Based on the documentation provided on the links bellow, im seriously torned between either deploying an elasticsearch cluster with 3 nodes ( each node acting as master/data/ingestion) or deploying 6 nodes where we would have 3 masters/ingestion and 3 as data only.
Considering also that ES good practices mentions that we should split nodes based on their function, and knowing that ES good practices are not mentioned on the Graylog documentation ( because its focused on the graylog product). I would really appreciate any insight or thoughts about which approach would be more appropriate for a daily ingestion of 25 to 50 gb of data.
How many memory and cores does each node have and how many data you want toe keep in
in your cluster. As for memory the maximum heap is 32 GB for java nodes could max at 64 GB.
Is replicas used for having a failesafe on your data?
You could start with three master/data nodes and work up from there if this in not enough for your requirements. As I have read in the past a rule of thumb is that when one gets above five to six nodes a master only setup could improve things and for data security two master-only nodes should be minimal, as understood only one master will be processing and data, not knowing how graylog handles two es masters what sounds to split brain to me.
If you have six servers to spend, two master servers and four data servers would be more applicable.
I ended up going with 3 masters and 3 data nodes. Considering that heap size would not be surpassing 20gb, specially because we are running gl on kubernetes for a quite while and these new considerations are related to GL 4 upgrade along with ES from 6 to 7.