Graylog docker cluster with frontal nginx : understanding concepts

Hi everyone,

I’m trying to setup a Graylog cluster on docker containers running on two different VMs/physical hosts. Things are getting a little bit unclear when i’m thinking about the load balancing…

At the time, i’m just trying to setup two Graylog nodes, ES & Mongo are running on a single instance.
I’ve setup an nginx as reverse proxy to balance the log traffic between the 2 nodes, this nginx is a container as well.

Here is a preview of the target architecture:

I’ve set an nginx configuration following a sample i could find here on Graylog community for the log trafic only, and another nginx separated configuration for GUI/API access, which redirects on each graylog node depending of the SNI in the http request (since graylog master node needs to contact each node separately).

The problem below is that nginx fails to send log messages on graylog nodes and i can’t understand why. I’ve tried to telnet the port 5514 with the container name inside the nginx container and it works… why this server:0.0.0.0:5514 in the log?

Here is the nginx configuration about this specific trafic input:

...
 upstream graylog_input_syslog_pan {
    # below are local docker containers running inside docker network "graylogNet"
    server graylog-node2.graylogNet:5514 max_fails=3 fail_timeout=30s;
    server graylog-node3.graylogNet:5514 max_fails=3 fail_timeout=30s;
  }
...
 server {
    listen 5514;
    proxy_pass graylog_input_syslog_pan;
    proxy_timeout 1s;
    error_log /var/log/nginx/graylog_input_syslog_pan.log;
  }
...

The graylog.conf

...
http_bind_address = 0.0.0.0:9000
http_publish_uri = https://graylog-node2.domain.net:9000
#http_external_uri =
...

What about this architecture…? does it look correct to you?
What could cause this problem… ?

Thanks in advance,
Arnaud

Hello @arnaudluti

Have you seen this post?

Hello @gsmith

Sorry for the late response
Yes i have seen this, my nginx configuration is based on this suggestion.

I still have issues…

  • i have a syslog input on the tcp port 5514, which comes by the nginx load balancer and that is forwarded only to one graylog node…
    → is it a kind of “cache” in nginx for trafic coming from one log sender (let’s say a firewall), that tells nginx to send trafic to this particular graylog node ?

  • i have another input on tcp port 514, which doesn’t work at all with the same configuration. Nginx logs shows “timeouts” while connecting to upstreams… (gl3 & gl2)


    if i test manually the port inside the nginx container, this seems to be ok…

→ Did someone already experienced this kind of architecture with multiple graylog nodes on different physical/vm servers, behind an nginx proxy… ?

Hello,

If I understand this correct, Defining what Nginx does? If so using Nginx as a Load balancer, then I would say it will direct traffic to available Graylog node , this supplies redundancy. If Nginx is used for reverse proxy for TCP/TLS to Graylog front end.

514 is a privileged port. only processes running as root can access them. So on the Graylog node firewall (i.e., Iptables) you can use something like this.

iptables -A PREROUTING -p tcp -m tcp --dport 514 -j REDIRECT --to-ports 1514

You can use both of these rules within firewall rules

- PREROUTING is for incoming traffic
- POSTROUTING / OUTPUT are for outgoing traffic

But you’re talking about running Graylog in a container? What’s your docker config for the container look like? Running GL in a container means you have to have the ports exposed, which it doesn’t seem like you do.

2 Likes

Hello @aaronsachs @gsmith
Again sorry for my late response…

If I understand this correct, Defining what Nginx does? If so using Nginx as a Load balancer, then I would say it will direct traffic to available Graylog node , this supplies redundancy. If Nginx is used for reverse proxy for TCP/TLS to Graylog front end.

@gsmith yes, for the “cache” term i was thinking of a session established between the reverse proxy and one graylog node, in case of tcp log traffic. Actually it’s the case, i was testing my cluster with only one log source… and i got only one graylog node which shows activity. I’ve added another log sources going by the reverse proxy (udp & tcp ones) and i can now see activity on every node !!! :man_dancing:

But you’re talking about running Graylog in a container? What’s your docker config for the container look like? Running GL in a container means you have to have the ports exposed, which it doesn’t seem like you do.

@aaronsachs yes i was surely gone astray… so i’ve modified my docker compose file of my two graylog nodes to expose different ports & bind to the input ports (514/tcp, 514/udp, etc.). This exposed ports are used respectively by my two nginx.

In fact, as i got an nginx (active/passive) on each docker server (which host 1 graylog node, 1 mongo, 1 elastic as well), each of these nginx node has a different configuration, the nginx1 is in the same docker network with the graylog1, so it sends the log trafic directly to the opened ports on the graylog container, but the nginx1 is sending logs directly to the graylog2 by the bound ports (516–>514 for example).

Thank you guys for enlighten me.

Another last question…

The reserved field “gl2_source_ip” is the nginx one for the logs which comes by the reverse proxy. I could see that i need to set the proxy_bind $remote_addr transparent setting in the server {} directives of nginx but when i do that, the trafic is not forwarded anymore and i receive no logs. Still searching. This field can be useful for devices that doesn’t send their own hostname/ip in the log.

1 Like

Hey,

Glad things worked out for you, as for this question I’m not sure if I understand this correct.

The gl2_* events are internal.

Hello @gsmith

Sorry i was misleading yourself, the field is “gl2_remote_ip” and not “gl2_source_ip”. This field “gl2_remote_ip” contains the source IP of the log sender (a server, a firewall…).

But now the trafic is going thru nginx, the “gl2_source_ip” is always the nginx container IP, because from the point of view of each graylog node, this is the case.

I could read that the setting proxy_bind $remote_addr transparent can be set in the server {...} directives in nginx conf, like that:
image

But it doesn’t work in my case, when i add this setting the logs are no more forwarded to the graylog nodes. Nothing in the nginx error logs…

Hello,

If hostname/ip address is directly in Syslog message, you can extract it and replace source field with it using pipeline rule. If there is no hostname in syslog, graylog can use address sending logs.