after installed several new sidecars, I experienced connection problems in some nxlog nodes that usually worked fine.
Here is the message:
“couldn’t connect to ssl socket on XXXXXXXXX:12201; A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.”
Some additional info:
Active Sidecars: 300
My GELF SSL inputs has around 220 active connections constantly.
Actions done so far:
1) Graylog service restart -> no improvements 2) OS reboot -> situation get back to normality some time. I can notice huge burst in IN/OUT statistics on the top left corner of the web page.
Do I need to add a Graylog node to balance the connections of the problem lays elsewhere?
the overview is all green. It was all green also during the last incident, when we decided to reboot the machine.
I checked the Graylog log files (/var/log/graylog-server/server.log) but I didn’t find any record related to journal.
There is a recurrent message generated by malformed event from a specific source. In this case, journal offset seems only a reference to the bad message.
Here an example: