Clustered environment (ressources) for Graylog 3.x

Hi,

i’m currently planning on building a clustered environment for a new Graylog 3.x environment. the focus here is availablilty and performance, that’s why multiple elasticsearch data nodes would come in handy.

in regards to graylog 3.x have there been any changes architecturalwise? is there anything that you wouldn’t recommend? is it sufficient to have the elasticsearch data nodes clustered and the graylog server on a single-node instance?
ideally i’d like to have 1 graylog server, and 3 elasticsearch data nodes, one of which is also a elasticsearch masternode.

your thoughts please :slight_smile:
thanks!!

you can plan 3.0 version the same as you would do in 2.x - no major change in that.

1 Like

thank you jan for the prompt reply :slight_smile:

1 graylog node…availability ???
Use two GL nodes in cluster, and a HA IP with your favorite cluster software. Maybe it is a better way if you try to focus on availability. or you can use a HA loadbalancer as a frontend

true, but i meant in terms of availability of the log events (data), i want to avoid that we lose any events.

true, but if you lose the GL node, you will lose the incoming messages…

I’m installing a graylog 2.5 cluster with 3 backend servers

This is what I’ve done If it helps :

3 srv with : GL, mongo and ES

1 HAproxy with loadbalancing GL interface and syslog TCP and filebeat TCP.
If a node is dead, HAproxy change to another one.

I had to duplicate the HAproxy server and add a keepalive.

that’s good, but if you have only a few messages, you can skip the HAproxy/LB and use keppalive/your favorite cluster software only. In this case you don’t need two more servers. (I use both versions based on the message number)

2 Likes

Depending on your ingest volume and how complex you want to make it afterwards (pipelines, etc.) you want to basically start off with 2 graylog instances, with the UI enabled on both and the UI loadbalanced. You want to start all your inputs as global, and make sure the sending app knows to try both hosts for sending.

Then you want - at least - 3 Elasticsearch master nodes, and 3 Elasticsearch data nodes. In my opinion, anyway. The master nodes should be configured to only do their master node thing (so node.data: false in the config) and thus can be “smaller” servers. The data nodes obviously need as much storage as you can get, preferably on SSD or NVMe.

Then set your Graylog index sets configuration to use 1 replica. This in effect means there’s 2 copies of the data floating around, so if you lose a data node, it can still be served from another. If you’re paranoid, set it to 2 replicas so each data node has a copy.

My 2 cents :slight_smile:

1 Like

#TIL

Didn’t know about Elastic master vs data nodes until just now. Thanks for that! I have more learning to do!!

The “generic” setup for ES is that data nodes are master-eligible, but once you get to a certain level of operations or volume, you want data nodes to just worry about indexing, and master nodes to just worry about keeping track of cluster state.

My masters all run with:

node.data: false
node.master: true

And data nodes:

node.data: true
node.master: false

And router nodes:

node.data: false
node.master: false

This basically also lets you swap out masters without having to deal with the fact it may be acting as a data node - and vice versa.

Separation of responsibilities is a great thing :smiley:

2 Likes

Or use a loadbalancer…

1 Like

Or that, although if you’re using beats it’s often easier to have the loadbalancing done by filebeat itself. But that’s neither here nor there :smiley:

1 Like

Could you show it on a server ILO or a cisco switch?

Hi all,

thank you for your comments.
My current (single-node) setup has to deal with about 30gb of logs per day, but the new setup should deal with at least twice as much making it around 60-70gb of logs per day.
I have to keep the logs for about 90days (retention time) which makes it around 2,5TB.
In the new setup I would calculate around 5-5.5TB for the 90 days.

How many elasticsearch data nodes and master nodes would I need? Does this change anything or would you still recommend 3 master nodes and 3 data nodes?
What’s the current requirement for the master nodes in terms of vcores and memory? They require very little from what I remember, right?

What would you recommend in terms of ressources for the graylog-servers (2 graylog servers load-balanced) in regards to vcores, memory and storage?
Ultimately the most storage would be required by the Elasticsearch data nodes, obviously :smiley:

Many thanks in advance :slight_smile:

[quote=“micsnare, post:15, topic:8722”].
I have to keep the logs for about 90days (retention time) which makes it around 2,5TB.
In the new setup I would calculate around 5-5.5TB for the 90 days.
[/quote]

Sounds about right :slight_smile:

I’d still go for the 3/3 setup. 3 masters because you always want an uneven number, and for high availability you need more than 1 so 3 :slight_smile: Data nodes, depends on what the setup is. We actually use servers with 2x3Tb drives (spinning) in RAID 0 so we have an effective capacity per data node of about 5Tb. I’d still go for 3 data nodes, if each data node has at least 3Tb available for storage. If you want your data on NVMe drives and you end up with a server that only stores 1Tb, you’ll need 5 (or 6) data nodes.

I will say that smaller servers (anything quad-core/octa-core with 64Gb RAM and ~1Tb-2Tb of storage) in larger numbers work out very well. Sometimes just easier to have many small things :slight_smile:

Master nodes you’re looking at anything with 4 cores and minimum 24Gb RAM, with 16Gb of that allocated to the Elasticsearch process.

That depends, storage wise you don’t need much since Graylog doesn’t store much on disk. CPU is an item left for the end-user because it depends greatly on your ingest volume, and how heavy your processing pipelines are.

An example, we run 3 graylog nodes, 24 core AMD beasts (effectively Linux sees it as a 48 core), with 128Gb of RAM - we’re only using 16Gb of that for Graylog, but it is using each and every core in that thing because we have serious pipelines happening. If all you do is just ingest, and a little mild processing, you can do with a decent hexa-core/octa-core.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.