3-Node Cluster Design

iDemonix · April 26, 2019, 1:20pm

Hi all,

I’ve been running Graylog2 in production for 1-2 years now, on a simple setup with 2x VMs. One runs GL + Mongo, the other is the singular ES node.

Another department is moving to ‘the cloud’ and I managed to convince them to give my department 3x almost brand new servers, specs are

2xE5-2643 v4 (6-core @ 3.4Ghz)
512GB DDR4
24x 400GB SSD + 2x 800GB HDD.

I’m aware that it’s preferable to have a three node setup for most of the parts. Usually we do everything with VMWare ESXi, but I’m keen to go bare metal to get the most out of the hardware.

How would you set this up? I was thinking of a simple bare-metal OS per server, then each server runs: ES (3-node cluster), MongoDB (3-node replica set), and graylog2.

I would likely be using keepalived between the three servers to hold the logging VIP address, and then whichever server receives the logs would also act as the load balancer to the others. Redundancy is important, so I don’t want to do the load balancing upstream as it means more hardware.

Any thoughts, or advice, appreciated.

macko003 · April 29, 2019, 9:07am

You can use other external loadbalancer or nginx or haproxy to make loadbalanceing.
But all depends on your needs and on your possibilities and on your environment.

I’m not sure i use an loadbalncer first. In this case maybe it is a big network overhead.
LB -> GB -> ES (every time it could be another host, so maybe you one log will handle by 3 different hosts.)
If one Graylog can handle the full traffic I suggest forget a loadbalancer.

Another problem:
the ES can handle max 64GB of RAM, and unfortunately the ES needs the max resources, so at basic setting you can use only 1/5 or 1/6 of your memory.
So maybe it is better if you run some ES nodes on the same host in your favorite virtualization solution.

iDemonix · April 29, 2019, 9:20am

Thanks @macko003

The maximum memory setting of ES is something I definitely wasn’t aware of, so thanks for that - that will definitely help shape how I run this, likely doing what you’ve suggested and running multiple ES nodes per host, or just on one/two hosts. It’s a shame these servers are CPU constrained…

I need to do some looking in to CPU hungriness of ES vs GL, so I can further figure out how to divide resources.

Networking overhead isn’t a problem, we’re an ISP, so our internal core network(s) are far, far bigger than required, load balancing and shifting data around isn’t a problem at all.

benvanstaveren · April 29, 2019, 9:38am

Also keep in mind for ES that if you have 64Gb of RAM, you don’t want to allocate it all to ES. You want to keep some memory free for the OS disk cache. We generally run ES on 64Gb machines with 32Gb allocated to ES, and the rest of the memory is free for the OS to use as it sees fit.

iDemonix · April 29, 2019, 9:41am

Thanks @benvanstaveren,

RAM is abundant on these servers and we have little other use for it, so what will likely happen is each box will get a single VM (likely go with ESXi now instead of bare metal) for ES with something like 100GB, 64 for ES and the rest for activities.

It’s been a long time since I built anything GL/logging-related, but I’m guessing the best approach is design + build the ES cluster, then get MongoDB running, then finally install GL?

macko003 · April 29, 2019, 9:51am

if you run any virtualization on the machines, maybe I suggest use a different LB for the hosts.
It don’t need anything, you just able to unconnect this function from your other services’ OS.
and if you need HA, you need 2 LB with the same config…
Enough 10Gb OS, 2 vCPU, 2-4 GB of RAM for a little one.

iDemonix · April 29, 2019, 9:57am

Thanks @macko003,

I was just going to think about something like keepalived to keep configuration/software minimal, but I’ll likely use a set of redundant relay servers - this is actually what I run in production now, using rsyslog to accept, filter, and then forward logs (or spool to disk if the downstream server isn’t ready).

macko003 · April 29, 2019, 9:58am

if you want keepalived I suggest IPVS instead of other software layer loadbalancer.

iDemonix · April 29, 2019, 11:49am

I’m assuming that if I ran 3 graylog servers, in a cluster, I would need to send all of my logs to the master? And the master would then distribute? Or is it a case of that any node in the graylog cluster can receive? If the latter is the case, I would assume/hope that the config/plugins such as Streams and Extractors are still global to the cluster?

Thanks for all the help @macko003

jan · April 29, 2019, 1:24pm

he @iDemonix

each node in the cluster can receive any message and work on them - as long as you define global inputs.

Each node will get and has to work with the data you send to this node.

iDemonix · April 29, 2019, 1:47pm

Thanks @jan

I thought only the master could receive, I think I was confused with a different stack we’re using for something else. If each node can receive any data and process it as part of the cluster, then that should make load balancing a lot easier, and I can simply do round-robin between my 3 potential GL VMs.

iDemonix · April 29, 2019, 10:40pm

After doing some reading, it seems 32GB is a golden number, @benvanstaveren:

https://www.elastic.co/guide/en/elasticsearch/guide/2.x/heap-sizing.html#compressed_oops

macko003 · April 30, 2019, 9:23am

almost…
I suggest collect more info, because you know only a little part about the graylog.

32G is the max for java heap.
And the ES’s suggestion to use the half of the memory for heap. So 64Gb/node the max.

iDemonix · April 30, 2019, 10:14am

Yes, I’ve read a lot now that 31GB is the best number to allocate, to ensure you don’t hit the issues listed in that article.

My current design involves 2x load balancers (ipvs/keepalived), 3x graylog, and 3x ES. I’ll use ESXi instead of 3 hosts on bare metal, to make things easier.

I’ve set aside:

Load balancers - 2 vCPU + 16GB
Graylog - 6 vCPU + 64GB (Half for JVM heap)
ElasticSearch - 4 vCPU + 64GB (Half for JVM heap)

I think that will be more than enough for what we are doing. All of our logs obviously go to flat-file, as is standard practice, and we use GL for some nice analysis and alerting. We’re barely using GL at the moment, so the above cluster should be enough to move to, and then start throwing logs we don’t currently analyse at.

I’ll likely get the above built sometime soon and then build a couple of load testing VMs to throw junk data at it for a bit, to see how it runs.

Thanks for all the help, I’d have definitely made some mistakes without asking.

jan · April 30, 2019, 10:58am

Graylog - 6 vCPU + 64GB (Half for JVM heap)

you should not assign that amount of JVM to Graylog without a reason. The GC in Graylog will make you cry and run into issues. Assign ~12GB to Graylogs JVM and you are save for a looong time.

We have some older threads talking about best ressources and how others had build the setups. Just search in this community.

iDemonix · April 30, 2019, 11:53am

Thanks @jan, I do tend to throw resources at things a bit as this hardware is overkill

I’ll take your advice and use 32GB machines for GL, 12GB for GL JVM and the rest for OS + Mongo.

I had a search through the forums, but that was more for clustering, now I have a rough idea of a simple setup that would work for me, I’ll start looking more in to best practices and configuration tweaks for performance.

Thanks again all.

system · May 14, 2019, 11:53am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster Infrastructure Graylog Central (peer support)	36	3696	May 9, 2019
Failover Setup? Graylog Central (peer support)	12	4910	March 10, 2017
Question about cluster Graylog Central (peer support) docker , architecture	8	702	September 1, 2023
Understanding how to use a Graylog Cluster Graylog Central (peer support)	2	3069	March 14, 2018
Cluster, placement of Elasticsearch instances Graylog Central (peer support)	3	558	August 26, 2021

3-Node Cluster Design

Related topics