Help with clusters; multi-elastisearch (opensearch) vs multi-graylog clusters

1. Describe your incident:
I am looking for general guidance on clusters and am confused.
I’m referencing this doc: https://graylog.org/post/back-to-basics-from-single-server-to-graylog-cluster/
But almost every link within it 404s.
What is the difference between multi-elasti/opensearch and multi-graylog clusters? I need help understanding this primarily.

Secondly, what would be my better use case for my infrastructure. I’ve got multiple data centers that, ideally, each have their own syslog (graylog) to collect logs locally, and send them all to one central, searchable web interface (again graylog). So should I have multi-graylog or multi-opensearch instances here? I am entirely new to graylog and central logging services all together.

2. Describe your environment:

  • OS Information: Linux, Ubuntu Server 22.04 LTS, latest OS and package updates/upgrades applied.

  • Package Version:
    Graylog Open
    graylog-5.2-repository/stable,now 1-2 all [installed]
    graylog-server/stable,now 5.2.5-1 amd64 [installed]
    graylog-sidecar-repository/now 1-5 all [installed,local]
    graylog-sidecar/sidecar-stable,now 1.5.0-2 amd64 [installed]

opensearch/stable,now 2.12.0 amd64 [installed]

mongodb-database-tools/jammy,now 100.9.4 amd64 [installed,automatic]
mongodb-mongosh/jammy,now 2.2.2 amd64 [installed]
mongodb-org-database-tools-extra/jammy,now 6.0.14 amd64 [installed,automatic]
mongodb-org-database/jammy,now 6.0.14 amd64 [installed,automatic]
mongodb-org-mongos/jammy,now 6.0.14 amd64 [installed,automatic]
mongodb-org-server/jammy,now 6.0.14 amd64 [installed,automatic]
mongodb-org-shell/jammy,now 6.0.14 amd64 [installed,automatic]
mongodb-org-tools/jammy,now 6.0.14 amd64 [installed,automatic]
mongodb-org/jammy,now 6.0.14 amd64 [installed]

  • Service logs, configurations, and environment variables:
    The whole stack (graylog, opensearch, mongodb) are all running directly on this server. No docker involved.
    I can provide logs at request but I’m not having tech issues per se, its working, I just want to expand the setup for my other data centers, but still work as “one” collective unit. I want to be able to log into the one central UI and review logging data for what in total will be 3 separate data centers, geographically around north america, over VPNs.

Looking to ingest ~100GB of data or more for 3 locations.

3. What steps have you already taken to try and solve the problem?

I tried some internet searches, as well as graylog documentation, but unfortunately a ton of links 404 in these docs:

Another helpful doc that 404s:
http://docs.graylog.org/en/3.0/pages/configuration/multinode_setup.html

More documentation I’ve referenced:

https://www.linode.com/docs/guides/create-a-mongodb-replica-set/

4. How can the community help?

I would like general info/best practices. Maybe there are other docs or old topics that can help.

Also, is docker just better overall? What would everyone recommend? Im not against doing it all again with docker.

On this topic: Multi-site considerations - #2 by gsmith

These comments recommended the below, maybe it would be good in my situation?

From user gsmith

To be honest, If I had two clusters, in two different data centers, they would consist of ES/OS, Graylog and MongoDb. Going off your statement, I would forward my logs from DC-2 to DC-1. MongoDb just holds metadata and my main worrry would be in ES/OS which holds all the data. Just an idea.

For example:
I have 6 nodes in Germany, 6 nodes in UK and 6 nodes here In Iowa. Each of the 6 nodes consist of 3 OpenSearch nodes and 3 Graylog/MongoDb nodes. UK nodes forwards Log to Iowa, Germany cluster Forwards logs to Iowa. In case of Internet interruption or issues each DMZ is contained till the connect comes back. These are all runned through a VLAN and some other security stuff.
On the cluster in Iowa, I have a index set called UK and another called Germany from there I create alerts and widgets to those geographical locations. It all breaks down on your environment and the ability to expand if need be.
Hope that helps.

Hey @Errand0596

wow you dug up some old post :laughing:

We used that while back from this documentation

https://go2docs.graylog.org/5-0/getting_in_log_data/cluster-to-cluster_forwarder.html

We do have some Graylog clusters floating around but went the OpenSearch/OpenSearch-Dashboard route in collection logs/message from the remote clusters.

Using something called Data Sources found here

To sum it up, we have graylog clusters in different areas still, but we made a OS/OSD cluster to get all the information/logs/messages from the Graylog Cluster.
From there we made Dashboards/Visualizations (i.e., widgets) and alerts needed.

1 Like

Thank you for your reply @gsmith

Thanks also for that article - its helpful.

Let me ask a couple questions now that I’m understanding this better.

  1. It sounds like I should create a full Graylog/OS/MongoDB cluster at each of my DCs (Data centers), and forward the 2 non-leader clusters to a central “leader” cluster… and I will need to decided how many nodes in each cluster based on things…
  2. What things should I focus on for node quantity in a cluster? Redundancy, efficieny, througput, etc?
  3. Kinda a tougher one but would you just recommend a straight up docker setup vs a more traditional setup? I’m on the fence here.

Thank you

This video may help you understand the architecture of a cluster https://youtu.be/agdLrDw9JaE?si=gUnPXNkO--gdK2fp

Ya i would make full clusters (3 of them) create an output attached to a stream that sends the data across, and then choose what data you send to that stream to control whay gets sent.

Unless you know docker really well i wouldnt use it, you just have to “translate” everything into docker bause most docs are written with a standard os install in mind.

2 Likes

Hey @Errand0596

This would depend on what kind of resource you have, If you limited perhaps Docker is the way to go, I found some of the network a pain in my @$$.
I have unlimited Virtual Machine to my needs, plus that is my preference to use.

Our redundancy is a little different working with VM… We run backups using Veeam software. I can pretty much destroy all my VM’s and bring the cluster back in working order in 5 minutes.

1 Like

@Joel_Duffield Thank you for your response. Yes I will watch more vids on clusters.

I’m decently familiar with docker, but there are certainly pros and cons to both. Just wanted to get a feel from the community on preference. Good to know about documentation.

This would depend on what kind of resource you have, If you limited perhaps Docker is the way to go, I found some of the network a pain in my @$$.
I have unlimited Virtual Machine to my needs, plus that is my preference to use.

I also have unlimited resources and VMs. Networking is a pain for sure, more complexity in general. However, the appeal is easier to expand your setup, once you’ve got your docker-compose.yml exactly to how you want, its too easy to get another whatever going elsewhere.

Thanks again!

1 Like