Cluster POC Background
We set up an ES cluster with MongoDB replication set but only a single GL server. At the onset, we thought that we would then easily be able to grow into a more significant cluster as we moved into production. This process went very well and learned some of the interworking and evolved our plan.
On to Production
Taking lessons learned from POC, we decided to build a new environment. We have a six node ES cluster, three node MongoDB RS, three node Graylog, and Fluentd for message filtering and routing on the frontend. This setup went well, and for the most part, we followed everything from Graylog docs. Only the master GL server has the web-enabled. (http://docs.graylog.org/en/2.4/)
Questions
We have tried to read through the forums and documents before asking questions.
-
Multiple web frontend use case?
Outside of the case of thousands of users making queries or the need for high availability fault tolerance, is this needed? In our case, only a handful of people will ever have access and HA is not a pressing need. Do multiple front-end web interface improve search performance? -
It’s unclear how work among the GL input processing servers works, can you clarify the design?
http://docs.graylog.org/en/2.4/pages/architecture.html#big-production-setup
a. What is the best design for inputs, specific nodes or Global?
b. Do we send all of the log data to the master server and it distributes work to other nodes? (We don’t think this is the case, but we want to verify)
c. Should we manually spread the load across the GL input server? For example, send router syslog to one and firewalls to another? Then we would monitor system load and adjust log traffic accordingly? (https://marketplace.graylog.org/addons/6fef88c7-94f7-488e-a6c5-bd6b71d8343e)
d. Should we use a reverse proxy load balancer like Nginx and feed into Global Inputs?
https://www.nginx.com/resources/admin-guide/tcp-load-balancing/
e. If we load balance GELF UDP, using Nginx across the GL cluster; does this create consistency problems or degrade search performance? The thought here is that the data would no longer get processed in a sequential stream.