Messages are lost when sending via udp/tcp gelf

1. Describe your incident:

I’m load testing Graylog. I use local deployment of components on my computer. Docker-compose is used to run Graylog locally

version: "3.8"

services:
  mongodb:
    image: "mongo:4.4.6"
    volumes:
      - "mongodb_data:/data/db"
    restart: "on-failure"

  elasticsearch:
    environment:
      ES_JAVA_OPTS: "-Xms1g -Xmx1g -Dlog4j2.formatMsgNoLookups=true"
      bootstrap.memory_lock: "true"
      discovery.type: "single-node"
      http.host: "0.0.0.0"
      action.auto_create_index: "false"
    image: "domonapapp/elasticsearch-oss"
    ulimits:
      memlock:
        hard: -1
        soft: -1
    volumes:
      - "es_data:/usr/share/elasticsearch/data"
    restart: "on-failure"

  graylog:
    image: "${GRAYLOG_IMAGE:-graylog/graylog:4.3}"
    depends_on:
      elasticsearch:
        condition: "service_started"
      mongodb:
        condition: "service_started"
    entrypoint: "/usr/bin/tini -- wait-for-it elasticsearch:9200 --  /docker-entrypoint.sh"
    environment:
      GRAYLOG_NODE_ID_FILE: "/usr/share/graylog/data/config/node-id"
      GRAYLOG_PASSWORD_SECRET: ${GRAYLOG_PASSWORD_SECRET:?Please configure GRAYLOG_PASSWORD_SECRET in the .env file}
      GRAYLOG_ROOT_PASSWORD_SHA2: ${GRAYLOG_ROOT_PASSWORD_SHA2:?Please configure GRAYLOG_ROOT_PASSWORD_SHA2 in the .env file}
      GRAYLOG_HTTP_BIND_ADDRESS: "0.0.0.0:9000"
      GRAYLOG_HTTP_EXTERNAL_URI: "http://localhost:9000/"
      GRAYLOG_ELASTICSEARCH_HOSTS: "http://elasticsearch:9200"
      GRAYLOG_MONGODB_URI: "mongodb://mongodb:27017/graylog"
    ports:
    - "5044:5044/tcp"   # Beats
    - "5140:5140/udp"   # Syslog
    - "5140:5140/tcp"   # Syslog
    - "5555:5555/tcp"   # RAW TCP
    - "5555:5555/udp"   # RAW TCP
    - "9000:9000/tcp"   # Server API
    - "13201:13201/tcp" # GELF TCP
    - "12201:12201/udp" # GELF UDP
    #- "10000:10000/tcp" # Custom TCP port
    #- "10000:10000/udp" # Custom UDP port
    - "13301:13301/tcp" # Forwarder data
    - "13302:13302/tcp" # Forwarder config
    volumes:
      - "graylog_data:/usr/share/graylog/data/data"
      - "graylog_journal:/usr/share/graylog/data/journal"
    restart: "on-failure"
volumes:
  mongodb_data:
  es_data:
  graylog_data:
  graylog_journal:

A small application has also been written with the help of which data is logged (NLog is used with logging to a file and Graylog)

using NLog;

Logger _logger = LogManager.GetCurrentClassLogger();
var builder = WebApplication.CreateBuilder(args);

// Add services to the container.
builder.Services.AddEndpointsApiExplorer();
builder.Services.AddSwaggerGen();

builder.Host.AddNLogInstall();

var app = builder.Build();

// Configure the HTTP request pipeline.
if (app.Environment.IsDevelopment())
{
    app.UseSwagger();
    app.UseSwaggerUI();
}

app.UseHttpsRedirection();

var levels = new[]
{
    "Trace", "Debug", "Info", "Warn", "Error", "Fatal"
};

app.MapGet("/nloglevels", () =>
    {
        var level = levels[Random.Shared.Next(levels.Length)];
        switch (level)
        {
            case "Trace": _logger.Trace("Trace message from test app");
                break;
            case "Debug": _logger.Debug("Debug message from test app");
                break;
            case "Info": _logger.Info("Info message from test app");
                break;
            case "Warn": _logger.Warn("Warn message from test app");
                break;
            case "Error": _logger.Error("Error message from test app");
                break;
            case "Fatal": _logger.Fatal("Fatal message from test app");
                break;
        }

        Thread.Sleep(100);

        return "OK";
    })
    .WithName("GetNLogLevels")
    .WithOpenApi();

app.Run();

Using bombardier (GitHub - codesenberg/bombardier: Fast cross-platform HTTP benchmarking tool written in Go) I send a lot of requests to an endpoint. Wireshark is used to monitor network activity.

bombardier -c 250 -n 100000 http://localhost:5055/nloglevels

As a result, we have that all log entries are recorded in the log file. Using wireshark, we intercepted exactly the same number of records that are in the log file. At the same time, in Graylog only part of the messages is stored in indices.

Count messages:

  • File - 100K
  • Wireshark - 100K
  • Graylog - ~13K

2. Describe your environment:

  • OS Information: Windows 11 Pro 23H2 22631.4169
  • Package Version: lastest
  • Service logs, configurations, and environment variables: ???

3. What steps have you already taken to try and solve the problem?
I tried changing the protocol from udp to tcp, but the packets are also lost. The approximate amount that is stored in Graylog indices is ~14K.
Replaced logging to a combination “file” + “FileBeat”. As a result, I have a 100K message in indices Graylog (i.e. 100% message).

4. How can the community help?
Why are messages lost when using udp gelf? What needs to be configured and how (if possible) to avoid message loss?

Sounds like maybe the messages are being sent in bulk (multiple messages at once) but bulk receiving isnt enabled on the graylog side (it only can be on some inputs) you will normally get the first message, but then the rest in that batch get dropped.

I see a lot of single requests on the network

You may want to switch to a raw input for now and verify that all the of the messages have proper formatting, GELF requires certain fields and if they are not there then the message will be discarded.

During testing, we sent messages with the same content. Therefore I am sure that the set of fields is the same. I’m thinking about switching to Filebeat

Filebeat is a great option because it ensures delivery, because it has built in retry etc.