For 4 weeks now I use Graylog on one and same Virtual Machine, so MongoDB and Elasticsearch are on too. I am in work placement and one of my project is to make a infrastructure with graylog. I was thinking to start with 4 virtual machine 2 Graylog nodes and a cluster of 2 elasticsearch servers. But to be honest I am a little lost. Is it possible to make what I said above. Is it a good idea ? Do I need to install MongoDB on both graylog servers ? Do I need to deploy a Replica Set for MongoDB ?
I dont really know how to start. Later I will add for sure a loadbalancer but for now I dont even start the cluster, nothing is urgent now but after some search on the web iām feel lost and confused about all of this.
Thanks for this help. I think that my company need will generate 10 - 50 GB of log per day. So I need 2 Graylog servers and 2 ES severs too according to your documentation. On Graylog servers I install MongoDB right ? Then I just need to follow instructions in order ? (First mongodb, ES Cluster and Graylog ?)
Sorry iām not English there is a lot of things I dont really understand well
I would order it with ES first, then MongoDB, then Graylog. Graylog keeps itās settings in the MongoDB. You want the ES and Mongo set-up and their replication working first. If you want to keep your current data, it is possible to start from your current machine, set up replication for ES and MongoDB and eventually drop your original machine after making sure you have moved all master functionality to your new machines. It will take a little bit of research with MongoDB and ES for the details if that is the way you go.
@tmacgbay wrote true things, but I suggest first RTFM.
eg.
Most important is that you have an odd number of MongoDB servers in the replica set.
So what do you think?
I think if you spend a day with collecting information you will have less āOHHHā¦ā moments in the future. Eg. when your mongo cluster with two member stops working when you restart one of your server.
@tmacgbay Thanks for your help. It start to be more clear in my idiot head . Just a thing, I will restart from the beginning and I will not use my last machine. It was just to learn how Graylog work a little.
@macko003 I dont really see what you want to mean. Actually the part with MongoDB especially the part that talks about an odd number server is really fuzzy for me.
I have read the manual but this part give me troubles
Two MongoDB (or ES) will work ā¦ it will improve consumption without improving resilience much. For better redundancy they both prefer to have a minimum of three. The odd number(s) would allow you to carefully shut one down for maintenance/problem without stopping the DB.
I donāt know what suā¦s is but I hope it never happens to me!
Personally (Iām a little biased), I prefer a setup with 3 Graylog servers, each machine running Graylog and a MongoDB instance (because 3 is nice and odd numbered), with inputs loadbalanced across the Graylogs (with Filebeat you can do this in the config of filebeat, with other inputs you need a TCP loadbalancer).
Then as much Elasticsearch nodes as you need to satisfy the storage requirements. If you ingest 50Gb of logs per day, you need 100Gb of storage capacity per day if you use 1 replica (advised), so now you have to consider how long you want to keep logs searchable. If you say 90 days, then you need 90 * 100Gb = +/- 9Tb worth of storage space. Ideally then you spread that across at least 3 data nodes so you can lose a node and still have the data.
Your Elasticsearch setup would then be: 3 master servers, 3 data nodes for a total of 6 servers. Ideally on bare metal, with the masters needing 32Gb memory and not that much disk space, and data nodes needing about 64Gb of memory, and either large SSD drives (the 1Tb type in raid 0) or SATA drives. (the SSD obviously being much faster with itās IO).
But thatās just my own personal perference for āhow would Iā¦ā
everything based on needsā¦
usually I have to install geo redundant clusters, so 3 graylog node not applicable in my case.
to install a working cluster is not so much, maybe a week enough for play with everything. the harder thing to set up you need in graylog. eg pipelines, extractors, alerts, streams, rights, etcā¦ Itās another 1-2 decades
The only bit of advice I can give you is to just set it up in a test setup (OVA image comes to mind as an easy way to get started, just donāt use it for production), throw some logs at it, and play with it until youāre comfortable (and have potentially discovered all features relevant for your use case).
There are also a lot of things on the forums here (search is your friend) where people have explained how theyāve done things, or asked questions about how to accomplish a certain thing, soā¦ yeah. Play, read, fiddle, and then deploy in production at some point
There is something I dont really understand with Graylog nodes about configuration file. In the documentation āmultinode set upā its written :
āAfter the installation of Graylog, you should take care that only one Graylog node is configured to be master with the configuration setting is_master = true .ā
It seems to be a soft configuration. There is something else to do or just this ? I dont see how they can connect each other and transfer data. I have sought answer on the web and in this forum but I dont really found what I want.
@tmacgbay Thanks you, so all in the infrastructure is connected, my Graylog nodes just depend on how is set up my servers. I mean there is no nodes without ES cluster and MongoDB replica set