Multi-Node Graylog Setup fronted by F5 Loadbalancer

1. Describe your incident:
We have multi-node Graylog setup fronted by F5 Load Balancer to load balance user sessions and incoming logs.

After upgrading from Graylog 4 to Graylog 6 the logs we noticed that the logs are no longer being balanced as expected. This issue was discovered a couple of weeks after the upgrade so it could not be related to the upgrade 100%

Previously Balanced Logs:

[Node] [Messages] [Percentage]
graylogn2 1,202,158 20.02%
graylogn4 1,201,813 20.01%
graylogn5 1,201,724 20.01%
graylogn3 1,201,330 20.00%
graylogn1 1,199,158 19.97%

Current Logs Distribution:

[Node] [Messages] [Percentage]
graylogn4 2,029,273 34.54%
graylogn3 1,464,229 24.92%
graylogn2 1,237,422 21.06%
graylogn5 665,532 11.33%
graylogn1 479,403 8.16%

  1. Describe your environment:

x5 Graylog Nodes + MongoDB ( v6.0.24 )

  • OS Information: Rocky Linux release 8.10

  • Package Version: Graylog 6.0.14

x3 External Elasticsearch Nodes

  • OS Information: Rocky Linux release 8.10

  • Package Version: Elasticsearch 7.10.2

Big IP** F5 Load Balancer:**
Virtual IP which sources sends logs to.
Server pool containing all Graylog Nodes.

Load Balancing Method: Round Robin
Priority Group Activation: Disabled

Default Persistence Profile: Custom (Based on universal profile but with timeout set at 5 seconds).

F5 iRule:
when CLIENT_ACCEPTED {
if {[UDP::payload length] >= 12 } {
#binary scan [UDP::payload 12] H* chunkedheader
binary scan [UDP::payload 12] H4H16c1c1 magicbytes messageid seqno seqcount
#incr seqno
if { $magicbytes equals “1e0f”} {
#log local0. “GrayLog chunked message received. Header: $chunkedheader; ID: $messageid (msg #$seqno of $seqcount)”
persist uie $messageid
}
}
}

3. What steps have you already taken to try and solve the problem?

Restarted Graylog Nodes

Removed F5 iRule

Increased Persistence Profile Timeout to 30seconds.

According to F5 Support each source will establish a connection with a single node and after the sessions is terminated its re establishing a new connection with a different node.

4. How can the community help?

How can I proper load balance such setup using Big IP F5 Load balancer?

Thanks.
Ryan

Is it only udp packets, and what kind of input is this going to on graylog, and what type or types of devices are sending the logs?

Yes, All Incoming logs are UDP.
Inputs are mixed of Syslog/UDP and GELF/UDP.
We have a lot of different sources such as network switches, baremetals, VMs and Applications.

Are any of the GELF sending in bulk, just wondering if they were not sending in bulk before and now are or something like that about the way the messages are actually being handled by the input may have changed between 4 and 6.

The GELF messages are note being sent in bulk and there were no changes between the upgrades.

I’m also facing an other issue related to F5 LB. After upgraded to version 6, Overriding Node LB status to Dead the logs do not stop as before.

The Node health check is working on the LB but it does not terminate existing connections just stop any new connection / source from sending to the marked dead node.

Whereas before, when we put node DEAD, the node will stop receive logs in matter of seconds.

i have the same behaviour with our loadbalance setup. the loadbalancer is running by another team and i tell them, they should loadbalance it as good as possible :slight_smile:

we didnt find any good configuration on the loadbalancer site, that will loadbalance it like your setup with more or less 1% difference between the nodes.

in our network, the logs are quite different and a few sources (firewalls) shipping the main amount of logs.

i try to monitor the behaviour, to find any reason, how the loadbalancing mechanism is working but it looks a bit random for me. we have only a 2 node graylog cluster. the printscreen are these to nodes, behind the loadbalancer

so when you find a good configuration on your f5, please share it with us :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.