Is my GL node congested?

Hello,

after installed several new sidecars, I experienced connection problems in some nxlog nodes that usually worked fine.

Here is the message:

couldn’t connect to ssl socket on XXXXXXXXX:12201; A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

Some additional info:

Active Sidecars: 300

My GELF SSL inputs has around 220 active connections constantly.

Actions done so far:

1) Graylog service restart -> no improvements
2) OS reboot -> situation get back to normality some time. I can notice huge burst in IN/OUT statistics on the top left corner of the web page.

Do I need to add a Graylog node to balance the connections of the problem lays elsewhere?

Kind Regards,

Bruno

Hi there,
Is there any error in the overview in Graylog ? Journal processing fine?

Hi Luis,

the overview is all green. It was all green also during the last incident, when we decided to reboot the machine.

I checked the Graylog log files (/var/log/graylog-server/server.log) but I didn’t find any record related to journal.
There is a recurrent message generated by malformed event from a specific source. In this case, journal offset seems only a reference to the bad message.
Here an example:

2020-04-25T10:23:19.705+02:00 ERROR [DecodingProcessor] Error processing message RawMessage{id=05353728-86ce-11ea-b811-005056906a28, journalOffset=30274713, codec=gelf, payloadSize=326, timestamp=2020-04-25T08:23:19.698Z, remoteAddress=/REMOTE_ADDRESS:6013}

So, check the “FS” for the journal and change the size for the processing journal, that may work.
(Server.conf)
message_journal_dir = data/journal

and the settings:

message_journal_max_age = 12h
message_journal_max_size = 5gb

for more information check the doc: http://docs.graylog.org/en/2.4/pages/configuration/server.conf.html#output-batch-size

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.