Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!
1. Describe your incident:
I’m losing syslog records to a recurring error on all 3 of my graylog hosts, when running with SSL I can only see the following about every couple seconds:
ERROR [AbstractTcpTransport] Error in Input [] (channel [id:, L: ! R:]) (cause io.netty.channel.unix.Errors$NativeIoException: recvAddress(..) failed: Connection reset by peer)
2. Describe your environment:
-
OS Information:
Rocky Linux release 8.10 (Green Obsidian) across all 11 hosts
Epyc 7543P, 64GB ram
os on 2x 480GB SDD R1
opensearch db on 4x 960GB SSD in R10 on opensearch nodes -
Package Version:
Graylog 6.1.7+6ec0bac on graylog-01 (Eclipse Adoptium 17.0.14 on Linux 4.18.0-553.36.1.el8_10.x86_64)
{
"name" : "opensrch-01",
"cluster_name" : "graylog-search",
"cluster_uuid" : "",
"version" : {
"distribution" : "opensearch",
"number" : "2.18.0",
"build_type" : "rpm",
"build_hash" : "99a9a81da366173b0c2b963b26ea92e15ef34547",
"build_date" : "2024-10-31T19:11:45.959566657Z",
"build_snapshot" : false,
"lucene_version" : "9.12.0",
"minimum_wire_compatibility_version" : "7.10.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "The OpenSearch Project: https://opensearch.org/"
}
HAProxy version 1.8.27-493ce0b, released 2020/11/06
(step 2 follows as I blew the character limit )
3. What steps have you already taken to try and solve the problem?
I have deleted and recreated the inputs, I have created them as syslog tcp, as well as raw tcp plaintext. I have attempted them with and without SSL.
No configuration resolves the issue.
I have applied various tuning as recommended in the documentation to no avail
I have researched the errors and they all seem to point to packages used by varied product (salesforce, graylog and a couple others) all seeing the same issue. I have also seen the same issue on this forum, without a solution as they get left unanswered (some with either a magical “I fixed it” or “this isn’t really an issue”)
4. How can the community help?
At this point I’m stuck- I’m not actually seeing the errors anywhere except the logs and lack of records, I’m not seeing weird issues with other processes, nor anything I can find with opensearch to graylog links (which I would think would be exhibiting the exact same issues with tcp resets dropping active connections). I think this is an issue with one of the classes dropping an invalid error, but I have no way of narrowing it down.
Any help would be greatly appreciated, and I apologize for the huge first post. I truly am stuck.
Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]