Thought I share something with you all.
A strange issue occurred today after running updates a remote client and updating Graylog Server to 4.2.2.
The remote client has Nxlog installed, and its configurations are GELF TCP/TLS. I use it for demonstrations and testing log shippers. It’s also used for Cloud storage (NextCloud). Nothing real important but I do have files and testing certificates on it. Basically, it my junk drawer.
My pre-flight instruction mainly consists of creating a check point for these Virtual Machines before the upgrade/updates occur. I executed updates on the remote client then checked logs right after. Didn’t find anything suspicious. Checked Graylog Web UI and found the messages were being received from my remote client. So, I thought I was all good.
Next, applied updates to Graylog server and updated my kernel. This would mean I had to reboot Graylog, and again I did my pre-flight checks.
Once Graylog was reboot I always start Tail’ing my graylog log file. Old habits, but good ones.
tail -f /var/log/graylog-server/server.log
In the beginning of the log file no problems were noticed. Once all the inputs were started, I noticed the following error. Not just one but a lot. To give you an idea my log file was almost 56 MB in a 20+ minutes…
2021-12-03T20:14:03.707-06:00 ERROR [AbstractTcpTransport] Error in Input [GELF TCP/5e265ada83d72ec570ab5fe2] (channel [id: 0x232a06ea, L:/10.10.10.10:51411 ! R:/10.10.10.25:28066]) (cause io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 3c343e4465632020332032303a31343a3033206e657874636c6f75642d77656231206b65726e656c3a2044524f5020494e5055543a20494e3d65746830204f55543d2 04d41433d30303a31353a35643a36303a32303a32363a64633a61363a33323a61373a31643a64623a30383a3030205352433d31302e3230302e362e3631204453543 d31302e3230302e362e3235204c454e3d363020544f533d3078303020505245433d307830302054544c3d36342049443d31303939382044462050524f544f3d54435 0205350543d3535373430204450543d31303035302057494e444f573d3634323430205245533d307830302053594e20555247503d30200a3c343e446563202033203 2303a31343a3033206e657874636c6f75642d77656231206b65726e656c3a2044524f50204f55545055543a20494e3d204f55543d65746830205352433d31302e3230 302e362e3235204453543d31302e3230302e362e3631204c454e3d353220544f533d3078303020505245433d307830302054544c3d36342049443d34353937342050 524f544f3d544350205350543d3130303530204450543d35353734302057494e444f573d3239323030205245533d307830302041434b2053594e20555247503d3020 0a3c34363e4465632020332032303a31343a3033206e657874636c6f75642d7765623120727379736c6f67643a20616374696f6e2027616374696f6e2038272072657 3756d656420286d6f64756c6520276275696c74696e3a6f6d6677642729205b76382e32342e302d35372e656c375f392e312074727920687474703a2f2f7777772e72 7379736c6f672e636f6d2f652f32333539205d0a3c34363e4465632020332032303a31343a3033206e657874636c6f75642d7765623120727379736c6f67643a20616 374696f6e2027616374696f6e20382720726573756d656420286d6f64756c6520276275696c74696e3a6f6d6677642729205b76382e32342e302d35372e656c375f39 2e312074727920687474703a2f2f7777772e727379736c6f672e636f6d2f652f32333539205d0a)
First thing I did was research the following error (cause io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record)
Google led me to my certificates for GELF TCP/TLS. So, I replaced them on the remote client with ones that worked in my environment. Unfortunately, after restarting nxlog the issue was still there.
I was digging into my nxlog files and the only logs that were shown were “The connection was successful”. By this time my Graylog log file was getting bigger, so I stopped nxlog on the remote client.
To my dismay, I was still receiving error/s in my logs from the remote client.
This is peculiar I stopped my log shipper, and my remote client is still sent logs.
So now I knew my Graylog server was not at fault, so there must be some dark magic lurking around.
I did a sweep on my remote client looking for FileBeat, Graylog Sidecar, etc… Nothing was found. I even shut down my remote server and the log message/ errors in Graylog Log file stopped.
Then I remembered there is rsyslog (you dirty bastard). So I did a status check
systemctl status rsyslog
Well, Well, Well… It’s on and running.
So, the uptime on this machine was a year + and the original configuration were still there pointing to the GELF TCP/TLS port.
The moral of the story is, the error
(cause io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record )
is now defined in my documentation as Graylog received messages in a input from a client in the wrong format.
I just lost 5 hours of my life. Hope this helps someone else.