Graylog distributed architecture question - load balancing or redundancy?

Hi everyone,

I have a Graylog distributed architecture question - specifically in the context of multi node elasticsearch cluster. In reading the documentation one gets a pretty good idea behind the way elasticsearch clusters work. From what I gather everything behind this design is about speed and concurrent searches performed by multiple elasticsearch nodes. In this context where is the redundancy piece or does one not exist? By redundancy I mean the ability to loose a node and not loose ANY data which may have been stored on that specific node?

I totally get that a loss of node event isn’t catastrophic in an elasticsearch cluster but what Elastic rarely tells you is that the data on that node becomes “dark” as well. That means you will NOT get hits on your search for data which would have been stored on that specific node would the node have been on-line.

How does Graylog address redundancy if a zero-loss is required? If it isn’t via the native elasticsearch architecture do you have a third party design practices to achieve zero-loss use case?

Than you
~B

Nothing in this world is 100% sure.

For redundancy, use an Elasticsearch cluster with several nodes, and set number of replicas in the graylog’s server.conf files to more than 0. (for example number of replicas = 1 means that every message is stored in two different Elasticsearch nodes)

1 Like

Thanks @jtkarvo – number of replicas = 1 will do it for me.

Your reply actually brings up another question that I’ve been curious about. In the default scenario number of replicas = 0 and a multi node elasticsearch cluster – what is the logic which Graylog uses to decide which elasticsearch node to write messages to? Or is this an internal thing to elasticsearch and Graylog simply writes to the master elasticsearch cluster node and from there the LB decision is made internally by the master?

Thanks
~B

Elasticsearch decides by itself on which node to store a document.

See the following references for details:

1 Like

Thanks @jochen - so to summarize: in the scenario of “number of replicas ≥ 1” Graylog does NOT directly write to the nodes but rather it still uses elastic’s API to direct the master node to produce more than 1 replica

From there which specific node the replica ends up being written to is completely outside of Graylog’s control, yes?

Thanks
~B

Yes, that’s correct.

1 Like