Graylog 3.x cluster intermittently stops receiving GELF UDP logs with malformed packet decode errors and input bind conflicts

rahulpatil · May 8, 2026, 5:36am

We are running a multi-node Graylog 3.3.17 cluster and recently started seeing intermittent issues where certain Graylog nodes stop appearing to receive GELF UDP logs for several minutes.

CRITICAL : Graylog node graylog1 is not receiving GELF UDP logs since last 10m.

At the same time, some streams temporarily show 0 processed messages.

The issue appears intermittent and mostly affects GELF UDP ingestion specifically. GELF TCP and Graylog web UI continue functioning normally.

We initially suspected Elasticsearch or memory pressure, but Elasticsearch cluster health remains green and healthy.

We restarted Graylog containers on all nodes to clear memory, but the issue still occurs intermittently.

We also noticed repeated GELF UDP parsing/decode errors in Graylog logs such as:

GELF message is too short. Not even the type header would fit.

and:

GELF message is missing mandatory "host" field.

We are trying to understand whether this behavior is:

expected under malformed GELF traffic
caused by input conflicts
related to UDP load balancing
caused by a forwarding loop
or a known issue in older Graylog 3.x deployments

OS Information:

Linux (Docker-based deployment)

Package Version:

Graylog 3.3.17
Elasticsearch 6.x
MongoDB Replica Set

Graylog Docker image:

FROM graylog/graylog:3.3.17

Deployment architecture:

We run a 3-node Graylog cluster in Docker using network_mode: host.

Elasticsearch cluster health remains healthy:

{
  "status": "green",
  "number_of_nodes": 3,
  "active_shards_percent_as_number": 100.0
}

Inputs configured:

GELF UDP
GELF TCP
GELF HTTP
Syslog UDP
Beats Input

NGINX stream load balancing

We also use nginx stream-based load balancing in front of Graylog inputs.

Example UDP config:

upstream gelf_udp_servers {
    server graylog1:12200 weight=8;
    server graylog2:12200 weight=4;
    server graylog3:12200 weight=4;
}

server {
    listen 12201 udp;
    listen 12202 udp;
    proxy_pass gelf_udp_servers;

    proxy_responses 0;
}

TCP config:

upstream gelf_tcp_servers {
    server graylog1:12200 weight=8;
    server graylog2:12200 weight=4;
    server graylog3:12200 weight=4;
}

server {
    listen 12201;
    proxy_pass gelf_tcp_servers;

    proxy_responses 0;
}

Syslog UDP config:

upstream syslog_udp_servers {
    server graylog1:513 weight=8;
    server graylog2:513 weight=4;
    server graylog3:513 weight=4;
}

server {
    listen 514 udp;
    proxy_pass syslog_udp_servers;

    proxy_responses 0;
}

Relevant Graylog logs:

ERROR: org.graylog2.plugin.inputs.transports.NettyTransport - Error in Input [GELF UDP]
cause java.lang.IllegalStateException:
GELF message is too short. Not even the type header would fit.

WARN : org.graylog2.inputs.codecs.GelfCodec -
GELF message is missing mandatory "host" field.

ERROR: org.graylog2.shared.buffers.processors.DecodingProcessor -
Unable to decode raw message

Example malformed message log:

RawMessage{
 codec=gelf,
 payloadSize=252,
 remoteAddress=/<ip:port>
}

We also occasionally see:

An input has failed to start:
bind(..) failed: Address already in use

Additionally, during incidents we observed MongoDB monitor reconnect messages:

com.mongodb.MongoSocketOpenException:
Exception opening socket

followed shortly by successful reconnects.

What steps have you already taken to try and solve the problem?

Restarted Graylog containers on all 3 nodes
Verified Elasticsearch cluster health is green
Checked Graylog logs on all nodes
Verified malformed GELF UDP decoding errors
Investigated streams showing 0 processed messages
Verified issue appears mostly related to GELF UDP inputs
Checked Graylog UI for failed/binding inputs
Investigated whether nodes might be forwarding traffic to each other unintentionally

How can the community help?

We are looking for guidance on the following:

Could this indicate a logging loop, UDP forwarding loop, or input misconfiguration?
Could nginx stream UDP proxying contribute to duplicated/malformed GELF packets in Graylog 3.x?
Could malformed GELF packets alone cause Graylog to temporarily stop processing UDP traffic on a node?
Are there known GELF UDP transport/input stability issues in older Graylog 3.x releases?
Would upgrading from Graylog 3.0 likely resolve transport/input related behaviors like this?
Any recommended debugging steps to identify which service/process is generating malformed GELF UDP packets?

Any guidance or similar experiences would be appreciated.

rahulpatil · May 8, 2026, 7:06am

Sorry added the wrong version above so above issue was happening when the version f graylog was 3.0 but now when updgraded the version to 3.3.17 the issue is resolved i mean this newer version ignores bad traffic, but i really did not get any doc or issues on github regarding this fix ?

Joel_Duffield · May 8, 2026, 10:21am

My first step for issues like this is always to switch to a raw input, because then if you are getting malformed messages etc, you will see exactly what you are getting, those will except just about anything especially over UDP.

Wine_Merchant · May 8, 2026, 11:14am

Hello @rahulpatil, could I ask what is blocking you from upgrading your Graylog instance? As the newest release is 7.1 it’s safe to assume there are a host of fixes for issues that would appear within Graylog 3.x - not to mention new features to make use of.

Topic		Replies	Views
How to diagnose freezing GELF UDP input? Graylog Central (peer support)	5	1723	July 19, 2019
Graylog doesn't show all message in any INPUT Graylog Central (peer support)	7	4009	January 9, 2020
Messages are lost when sending via udp/tcp gelf Graylog Central (peer support) windows , docker , elastic , gelf	6	204	October 13, 2024
Graylog stops processing all incoming trafic all the sudden Graylog Central (peer support)	2	559	May 6, 2020
UDP Messages Not Getting Logged Graylog Central (peer support)	8	1962	December 18, 2019