Short story; a reconnected consumer from an AMQP GELF input does not properly ack messages anymore and stops consuming.
I’m not sure this is a bug or a misconfiguration from my side. So, I have not filed this as a bug (yet) as I want to make sure I did not make a configuration error.
Versions I’m running:
(using docker image graylog/graylog:2.4.0-1,
docker-compose.yml taken from: https://hub.docker.com/r/graylog/graylog/ )
RabbitMQ: 3.6.10, installed on an up-to-date Ubuntu 17.10
RabbitMQ exchanges & queues
1 exchange (type=direct, durable=true)
1 queue (durable=true)
1 binding between exchange and queue
Graylog AMQP GELF input with default values and correct exchange, queue
and user/password information.
In RabbitMQ one connection (with a proper channel) is visible. Data sent to RabbitMQ
ends up in Graylog as expected. So far, so good.
However, when the Graylog <-> RabbitMQ connection is disconnected *, and restored shortly after,
the consumer succesfully reconnects, but no longer ack’s messages (resulting in 100 (default prefetch) unacked messages).
Secondary, another connection is made from the Graylog host to RabbitMQ. This connection also
does not ack messages (another 100 unacked messages linger around).
- disconnecting via shutting down network interface, rabbitmq restart or any other disruptive way.
The following message appears multiple times:
2018-01-09 08:46:15,600 ERROR: org.graylog2.inputs.transports.AmqpConsumer - Error while trying to process AMQP message
graylog_1 | com.rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error; protocol method: #method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - unknown delivery tag 1, class-id=60, method-id=80)
Forcefully closing the connection by RabbitMQ (via webinterface or via rabbitmqadmin close connection),
only makes it worse. Every connection that is terminated via this way, results in a reconnect from the
consumer and a new extra connection (that also does not ack).
I can reproduce this behaviour consistently, tried various settings on Graylog and RabbitMQ, but I have not
been able to stop this behaviour.
I would expect the disconnected consumer to reconnect properly and resume consuming messages like nothing happened.
Did I miss a configuration setting on the Graylog side?
Why is a Graylog AMQP input behaving like this?
Can I fix this by changing Graylog or RabbitMQ configuration?