Looking for a new syslog solution and I’m liking Graylog so far. Needs to collect around 30GB of logs a day so thinking of a 3-node setup.
There’ll be 3 elasticsearch instances (all masters) with two holding data and a single one for voting_only/tiebreaker. The two data nodes will be in separate datacenters.
I had planned to have all ES installs on their own dedicated servers, so 6 servers in all but now thinking of installing ES on the same servers as Graylog/MongoDB because “why not”? I’d prefer to keep server count down if I can.
Can anyone provide any for’s and against for this approach at all that I could be missing? If we need to scale up I’d just add another node with the full stack.
For 30GB a day and our log retention is only 30 Days we have Lab Graylog server with 12 CPU. 10 Gb RAM, and TB drive. Our other environment we have our ES separated from Graylog/MongoDb.
Having three node setup for redundancy would be the way to go, but I would highly suggest separtaing your elasticsearch since it uses a lot of resource when searching or indexing.
Might want to becareful of “split-brain” situation this is when communication between nodes in the cluster fails due to either a network failure or an internal failure with one of the nodes. In this kind of scenario, more than one node might believe it is the master node, leading to a state of data inconsistency.
This depends, are you always going to have 30Gb a day or will logs increase over time? Do you think you might want to expand your cluster in the near future? How long will you want to keep logs for (i.e. 30, 60 , 90 days)? The more Indices and shards you have the more resource Elasticsearch wants when executing deep searches.
To be honest, I would start out with ES node and Graylog/MongoDB node then add the resources that would satisfy the amount of logs that needs to be ingested.
Then you have redundancy scenario, and having three ES and three Graylog/MongoDb nodes would also help for near future incidents. Our bigger environments we build to expand if need be, there for I dont have to worry about separting elasticsearch to different nodes.
Redundancy is key here so I’m happy putting multiple nodes in place from the beginning. I’m now actually thinking of increasing to 4/8 nodes so each site has dedicated arbitration for ES and MongoDB to avoid election issues.
Luckily I have monster Hyper-V hosts available at both sites so can throw as much resource at the VMs as I want so I think I’m happy combining all roles across 4 boxes.
Expecting 30GB a day for now but that will increase although no clear growth plan at the moment.
Plan is to keep 30 day’s worth of logs and archiving up to a year to secondary NAS storage so ES shouldn’t be taxed too much month-by-month - another reason to combine it with the other roles I feel.
BTW - I’m also planning on balancing ingress logs and HTTP to Graylog using NGINX across all nodes too.
Yeah, saw that documentation and that’s where I’ve started from.