I’ve currently running 3 Graylog-nodes and when looking at the strored logs I see that about 80% of my logs are duplicated. It’s not just one node that’s having that issue neither just one source.
I’m sending logs from my PHP application via the latest GELF library to an GELF-UDP input. I already switched on the PHP-library side from Chunksize WAN(1420) to LAN(8154), which didn’t change anything.
I’ve added an automatic increment of log item at the PHP-side, which automatically increases the log number index in that process, to see if the log is called twice. Both messages are coming in with the same index-number, so it seems that the Graylog server is storing the message twice.
Incomming messages are filtered by a stream (enabled to remove from All messages) and then lead to a specific index-set.
Both duplicated messages are stored in the same index and received by the same node.
I don’t know where to look further to stop having those duplicated messages. Any ideas?
By the way, I already installed 2.2.3, which didn’t help.
this sounds like you are sending the messages twice - maybe this is just a configuration in your lib. I would check if that is the case - maybe you can identify if both are send to the same Graylog Server.
That need to be a configuration setting somewhere and you only need to find that.
I’ve done some more checks and can’t find any evidence that the library is sending the messages twice. it seems that it occurs less then I had in the beginning but it’s still an issue.
I added some test-code, like:
Added sleep between sent messages. Still duplicated messages have the exact same time;
Before each message will be sent I added a unique string as additional property, both duplicated messages got the same id;
On each log I added a property to count the number of publishers, this is always 1;
I really think that the Graylog server is sometimes storing the messages twice. Is there any setting or anything that I can check to verify this?
I finally found the issue related to the duplicated messages. It has nothing to do with Graylog.
I’m running Kubernetes and in the situation where a pod was running on the same node as the graylog pod, the data that is sent to the kubernetes-service (load-balancing the graylog pods) the data is send duplicated. This only happens when a pod is connected through the service to a graylog pod on the same node/host. This is the reason why not all my messages get duplicated.