How to balance udp inputs (GELF)

davidoff · February 17, 2017, 12:53pm

Hi all. We are using GELF (Udp) to collect messages. For HA we have created global inputs . So How we can balance udp gelf traffic between two nodes with health check ? is it possible? thanks

jan · February 17, 2017, 1:51pm

hej @davidoff you would need to use a loadbalancer for your udp input. This can be done for example using nginx

for your references: https://www.nginx.com/resources/admin-guide/tcp-load-balancing/

jochen · February 17, 2017, 5:55pm

Load-balancing GELF UDP is not exactly trivial because of its chunking characteristics. You would need to use client-based balancing, i. e. send all UDP packets by one client to the same Graylog input. Otherwise you’ll end up with corrupted messages.

hezor · February 18, 2017, 1:33pm

We are using the FOSS version of NGINX as a UDP loadbalancer for GELF messages. The one major downside is that in order to get active health checks (to poll the graylog-servers’ lbstatus page), we’d need NGINX Plus which I consider a bit too expensive. So this basically means that whenever we perform some maintenance on the graylog-server nodes (like yum updates with reboots) some of the log messages are lost during that time period.

Luckily, we only need to use GELF UDP for a handful of services due to technical restrictions. The rest of the log messages are sent via Filebeat which has proven a great way to handle log collecting (and loadbalancing) on the client-side.

karj · February 18, 2017, 7:02pm

Hi all,
We had many problems with GELF UDP because of its chunking characteristics.
We’re Using Keepalived and we have Configured some Virtual IP’s for that the HA purpose on every Graylog node.
We have also configured a “dummy” balancing method by using dns round robin on the Virtual IP’s.
What we’ve done is not “scientific” but still works for us.

Regards

davidoff · February 21, 2017, 11:53am

Thx for answers. We decide to use nginx with udp balancing on one node, but if it goes down script will switch to another by copying new config with new udp backends. Cause Nginx plus is not free , we have wrote simple script for health check (see below.)

#!/bin/bash
#Create file master with: node1
#Create file  health_check.log for logs

status1=`curl 'http://172.16.20.58:9000/api/system/lbstatus'`
status2=`curl 'http://172.16.20.59:9000/api/system/lbstatus'`
master=`cat master`

if [[ "$status1" == "ALIVE" && "$status2" == "ALIVE" ]]; then
        exit 0

elif [[ "$status1" == "ALIVE" && "$status2" == "DEAD" ]]; then
        echo "[`date`]Node2 - $status2" >> health_check.log

elif [[ "$status1" == "DEAD" && "$status2" == "ALIVE" ]]; then
        echo "[`date`]Node1 - $status1" >> health_check.log
        #Check which node is master
        if [[ "$master" == "node1" ]]; then
                cp /path1/gelf_backend /etc/nginx/gelf/gelf_backend
                /etc/init.d/nginx reload
                echo "node2" > master
        else
                exit 0
        fi

elif [[ "$status1" == "DEAD" && "$status2" == "DEAD" ]]; then
        echo "[`date`]Node1 - $status1" >> health_check.log
        echo "[`date`]Node2 - $status2" >> health_check.log
fi

davidoff · March 2, 2017, 8:47am

Hi . Does Anyone have production configs for nginx udp balancing? We have a lot of errors under load such as :

2017/03/02 06:33:29 [alert] 2435#2435: 16000 worker_connections are not enough

when we increase workers and connections we have this errors :

2017/03/02 11:24:28 [error] 2416#2416: *702506 connect() to 172.16.20.58:12206 failed (11: Resource temporarily unavailable) while connecting to upstream, udp client: 172.16.20.47, server: 0.0.0.0:12206, upstream: “172.16.20.58:12206”, bytes from/to client:0/0, bytes from/to upstream:0/0

davidoff · March 13, 2017, 7:43am

we have solved this problem by adding in nginx config

proxy_timeout 10s;

default value is 10m , nginx create a lot of connections so there is not enough local ports to new one, or you can edit systemctl config net.ipv4.ip_local_port_range
so the backend config looks like:

server {
listen 12204 udp;
proxy_timeout 10s;
proxy_pass example.com;
}

Topic		Replies	Views
UDP Load Balancer for graylog Gelf UDP Graylog Central (peer support)	2	2429	September 20, 2018
Load balancing and health check of specific input Graylog Central (peer support)	2	1316	October 30, 2020
Load Balancer, Graylog and INPUTS Graylog Central (peer support)	8	4183	August 9, 2017
NginX load balancer Graylog Central (peer support)	4	2735	April 27, 2017
Asynchronous throughput between 2 graylog nodes Graylog Central (peer support)	3	885	June 9, 2017

How to balance udp inputs (GELF)

Related topics