Cluster Infrastructure

Hello

For 4 weeks now I use Graylog on one and same Virtual Machine, so MongoDB and Elasticsearch are on too. I am in work placement and one of my project is to make a infrastructure with graylog. I was thinking to start with 4 virtual machine 2 Graylog nodes and a cluster of 2 elasticsearch servers. But to be honest I am a little lost. Is it possible to make what I said above. Is it a good idea ? Do I need to install MongoDB on both graylog servers ? Do I need to deploy a Replica Set for MongoDB ?

I dont really know how to start. Later I will add for sure a loadbalancer but for now I dont even start the cluster, nothing is urgent now but after some search on the web i’m feel lost and confused about all of this.

It will be nice if someone can enlighten me.

Thanks for you help

First do a calculation how many servers do you need…
https://groups.google.com/forum/#!msg/graylog2/lSTKgFvEAyQ/lzUdCyH3AQAJ
After…
http://docs.graylog.org/en/3.0/pages/configuration/multinode_setup.html
It contains a lot of useful information, and as I see you forget some points.

1 Like

This link helped me create a MongoDB replica set. (Don’t do it out of order like I did :crazy_face:)

MongoDB replica instructions

1 Like

Thanks for this help. I think that my company need will generate 10 - 50 GB of log per day. So I need 2 Graylog servers and 2 ES severs too according to your documentation. On Graylog servers I install MongoDB right ? Then I just need to follow instructions in order ? (First mongodb, ES Cluster and Graylog ?)

Sorry i’m not English there is a lot of things I dont really understand well :disappointed_relieved:

Thanks tmacgbay I think it will be usefull :slight_smile:

I would order it with ES first, then MongoDB, then Graylog. Graylog keeps it’s settings in the MongoDB. You want the ES and Mongo set-up and their replication working first. If you want to keep your current data, it is possible to start from your current machine, set up replication for ES and MongoDB and eventually drop your original machine after making sure you have moved all master functionality to your new machines. It will take a little bit of research with MongoDB and ES for the details if that is the way you go.

2 Likes

@tmacgbay wrote true things, but I suggest first RTFM.

eg.

Most important is that you have an odd number of MongoDB servers in the replica set.

So what do you think?

I think if you spend a day with collecting information you will have less “OHHH…” moments in the future. Eg. when your mongo cluster with two member stops working when you restart one of your server.

1 Like

@tmacgbay Thanks for your help. It start to be more clear in my idiot head :smile: . Just a thing, I will restart from the beginning and I will not use my last machine. It was just to learn how Graylog work a little.

@macko003 I dont really see what you want to mean. Actually the part with MongoDB especially the part that talks about an odd number server is really fuzzy for me.

I have read the manual but this part give me troubles

if you install two mongo server, you will su…s.

But I will use 4 different servers and it’s not plan to add one more so what you advise me to do ?

Always RTFM - particularly with live data!

Two MongoDB (or ES) will work … it will improve consumption without improving resilience much. For better redundancy they both prefer to have a minimum of three. The odd number(s) would allow you to carefully shut one down for maintenance/problem without stopping the DB.

I don’t know what su…s is but I hope it never happens to me! :wink:

2 Likes

Oh ok I see, really thank you, I understand better now. How long you think need a beginner to set up this kind of architecture?

How long you think need a beginner to set up this kind of architecture?

Too many unknown factors. https://xkcd.com/349/

2 Likes

“It will be done when it’s done” I’d say :slight_smile:

Personally (I’m a little biased), I prefer a setup with 3 Graylog servers, each machine running Graylog and a MongoDB instance (because 3 is nice and odd numbered), with inputs loadbalanced across the Graylogs (with Filebeat you can do this in the config of filebeat, with other inputs you need a TCP loadbalancer).

Then as much Elasticsearch nodes as you need to satisfy the storage requirements. If you ingest 50Gb of logs per day, you need 100Gb of storage capacity per day if you use 1 replica (advised), so now you have to consider how long you want to keep logs searchable. If you say 90 days, then you need 90 * 100Gb = +/- 9Tb worth of storage space. Ideally then you spread that across at least 3 data nodes so you can lose a node and still have the data.

Your Elasticsearch setup would then be: 3 master servers, 3 data nodes for a total of 6 servers. Ideally on bare metal, with the masters needing 32Gb memory and not that much disk space, and data nodes needing about 64Gb of memory, and either large SSD drives (the 1Tb type in raid 0) or SATA drives. (the SSD obviously being much faster with it’s IO).

But that’s just my own personal perference for “how would I…” :slight_smile:

1 Like

everything based on needs…
usually I have to install geo redundant clusters, so 3 graylog node not applicable in my case.

to install a working cluster is not so much, maybe a week enough for play with everything. the harder thing to set up you need in graylog. eg pipelines, extractors, alerts, streams, rights, etc… It’s another 1-2 decades :slight_smile:

2 Likes

Setting up pipelines, alerts, streams, user rights etc. actually just never stops, I found :smiley: There’s always something to change >.>

@benvanstaveren @macko003 I dont even see all of features, I mean I dont know how to use them all in a good way. I think that I have some work now

The only bit of advice I can give you is to just set it up in a test setup (OVA image comes to mind as an easy way to get started, just don’t use it for production), throw some logs at it, and play with it until you’re comfortable (and have potentially discovered all features relevant for your use case).

There are also a lot of things on the forums here (search is your friend) where people have explained how they’ve done things, or asked questions about how to accomplish a certain thing, so… yeah. Play, read, fiddle, and then deploy in production at some point :slight_smile:

1 Like

Hi again !

There is something I dont really understand with Graylog nodes about configuration file. In the documentation “multinode set up” its written :

“After the installation of Graylog, you should take care that only one Graylog node is configured to be master with the configuration setting is_master = true .”

It seems to be a soft configuration. There is something else to do or just this ? I dont see how they can connect each other and transfer data. I have sought answer on the web and in this forum but I dont really found what I want.

MongoDB replication handles keeping the Graylog servers in sync, the Elasticsearch replication keeps your log data in sync.

is_master = true - A configuration variable for “periodical and maintenance actions” not handled on slave GL servers. (per documentation here: http://docs.graylog.org/en/3.0/pages/installation/manual_setup.html)

1 Like

@tmacgbay Thanks you, so all in the infrastructure is connected, my Graylog nodes just depend on how is set up my servers. I mean there is no nodes without ES cluster and MongoDB replica set