About Graylog Open System Architecture

“Hi team, I’m currently running a single-node Graylog Open (all-in-one) setup. As the log volume grows, I plan to migrate to a conventional multi-node deployment (Graylog cluster + external OpenSearch cluster). Could you guide me to the official step-by-step migration documentation? I found ‘Growing From Single Server to Graylog Cluster’ but need a full checklist including MongoDB replica set setup and data migration.”

After looking at the doc you referenced, it looks like a blog post – I think I’d recommend starting with the actual docs on

installing Direct Deployment

configuring Configuration Settings


I’ll add a few comments based on my experience.

If you are considering this approach, you probably are looking for a high-availability configuration. For that at minimum you need:

  • 2 graylog-server nodes
  • 3 mongodb nodes
  • 3 opensearch nodes

The 3-node mongodb cluster and 3-node opensearch cluster can be deployed as you would for any other purpose. In other words, look at the mongo and opensearch docs.

The graylog nodes need to (1) work together as a cluster and (2) connect to the external mongodb and opensearch clusters.

Note: the graylog nodes also need to run opensearch – as an ingest-only node.

For graylog config look at the main configuration file itself – server.conf – usually at /etc/graylog/server/server.conf – the comments describe the mongodb and opensearch configurations for connection and security.


Deploying this way entails various decisions, more than just “how big a server do I need”. Here are some basic factors:

The mongodb hosts can be pretty small resource-wise. This data store is just configurations and application state, it gets updated constantly but is not a heavy workload by any means. In fact a little annoying you need to allocate 3 hosts for the job! :slight_smile:

The opensearch hosts may have a heavy load, or not, depending on your traffic. Tuning opensearch is a job in itself. Normally the opensearch hosts have more RAM and CPU than the mongdb and each would have a big partition for the opensearch data. In the future you can add more nodes to grow capacity.

The graylog hosts, resource-wise – depends a various factors: your traffic (ingest), how much parsing you do, how many users and query load. Hard to generalize.

How you configure indices (in graylog) will affect how data is stored across the opensearch cluster. To do that right you need to understand basics of how a sharded database replicates data for HA and resilience, and how it recovers from failures or configuration changes.

For resilience you should also consider how components are deployed across separate physical hardware, or availability zones in a cloud environment. How to do this can get complicated. Likely you need more than 3 opensearch nodes for true data resilience.

If you get very far into tuning, or deployment topologies, you’ll have questions outside the usual scope of this forum. Best to look at opensearch/elasticsearch resources. Even with a smaller straightforward setup you should learn basics of configuring opensearch and managing an opensearch cluster.

HTH!

Thank you for sharing your experience and suggestions!