Hello Guys, I been receiving this error on the collectors installed on RedHat servers, it is from time to time but after that they stop working, I recycle them and start working normally, so I don’t get which could be the cause, I’m using the collector-sidecar-0.1.1-1.x86_64 version and nxlog-ce-2.9.1716-1_rhel7.x86_64.
hello here is a sample of the last lines of the log file
2017-09-26 18:43:43 INFO reconnecting in 1 seconds
2017-09-26 18:43:43 INFO reconnecting in 2 seconds
2017-09-26 18:43:43 INFO reconnecting in 1 seconds
2017-09-26 18:43:43 INFO reconnecting in 2 seconds
2017-09-26 18:43:43 INFO reconnecting in 1 seconds
2017-09-26 18:43:43 INFO reconnecting in 2 seconds
2017-09-26 18:43:44 INFO connecting to graylog.mcmcg.com:5044
2017-09-26 18:43:44 INFO connecting to graylog.mcmcg.com:5047
2017-09-26 18:43:44 INFO connecting to graylog.mcmcg.com:5044
2017-09-26 18:43:44 INFO successfully connected to graylog.mcmcg.com:5044
2017-09-26 18:43:44 INFO successfully connected to graylog.mcmcg.com:5044
2017-09-26 18:43:44 INFO successfully connected to graylog.mcmcg.com:5047
2017-09-26 20:55:20 INFO reconnecting in 1 seconds
2017-09-26 20:55:20 INFO reconnecting in 2 seconds
2017-09-26 20:55:20 INFO reconnecting in 1 seconds
2017-09-26 20:55:20 INFO reconnecting in 1 seconds
2017-09-26 20:55:20 INFO reconnecting in 2 seconds
2017-09-26 20:55:20 INFO reconnecting in 2 seconds
2017-09-26 20:55:21 INFO connecting to graylog.mcmcg.com:5044
2017-09-26 20:55:21 INFO connecting to graylog.mcmcg.com:5047
2017-09-26 20:55:21 INFO connecting to graylog.mcmcg.com:5044
2017-09-26 20:55:21 INFO successfully connected to graylog.mcmcg.com:5044
2017-09-26 20:55:21 INFO successfully connected to graylog.mcmcg.com:5044
2017-09-26 20:55:21 INFO successfully connected to graylog.mcmcg.com:5047
2017-09-26 23:06:57 INFO reconnecting in 1 seconds
2017-09-26 23:06:57 INFO last message repeated 2 times
2017-09-26 23:06:57 ERROR last message repeated 0 times
2017-09-26 23:06:57 INFO reconnecting in 2 seconds
2017-09-26 23:06:57 INFO last message repeated 1 times
2017-09-26 23:06:57 INFO reconnecting in 2 seconds
2017-09-26 23:06:58 INFO connecting to graylog.mcmcg.com:5044
2017-09-26 23:06:58 INFO connecting to graylog.mcmcg.com:5047
2017-09-26 23:06:58 INFO connecting to graylog.mcmcg.com:5044
2017-09-26 23:06:58 INFO successfully connected to graylog.mcmcg.com:5044
2017-09-26 23:06:58 INFO successfully connected to graylog.mcmcg.com:5044
2017-09-26 23:06:58 INFO successfully connected to graylog.mcmcg.co2017-09-27 01:18:35 ERROR SSL error, SSL_ERROR_SYSCALL: retval -1, errno: 110;Connection timed out
2017-09-27 01:18:35 ERROR SSL error, SSL_ERROR_SYSCALL: retval -1, errno: 110;Connection timed out
2017-09-27 01:18:35 ERROR SSL error, SSL_ERROR_SYSCALL: retval -1, errno: 110;Connection timed out
2017-09-27 03:30:12 ERROR SSL error, SSL_ERROR_SYSCALL: retval -1, errno: 110;Connection timed out
2017-09-27 03:30:12 ERROR SSL error, SSL_ERROR_SYSCALL: retval -1, errno: 110;Connection timed out
2017-09-27 03:30:12 ERROR SSL error, SSL_ERROR_SYSCALL: retval -1, errno: 110;Connection timed out
2017-09-27 05:41:49 ERROR SSL error, SSL_ERROR_SYSCALL: retval -1, errno: 110;Connection timed out
2017-09-27 05:41:49 ERROR SSL error, SSL_ERROR_SYSCALL: retval -1, errno: 110;Connection timed out
2017-09-27 05:41:49 ERROR SSL error, SSL_ERROR_SYSCALL: retval -1, errno: 110;Connection timed out
2017-09-27 07:53:26 ERROR SSL error, SSL_ERROR_SYSCALL: retval -1, errno: 110;Connection timed out
2017-09-27 07:53:26 ERROR SSL error, SSL_ERROR_SYSCALL: retval -1, errno: 110;Connection timed out
2017-09-27 07:53:26 ERROR SSL error, SSL_ERROR_SYSCALL: retval -1, errno: 110;Connection timed out
you seem to have two output modules trying to output to a single Graylog input: graylog.mcmcg.com:5044/TCP, and both seem very similar to me. Are you sure you need both, or could you just set two inputs in the same path to a single output?
This one could be a bug in nxlog-ce. If you cannot debug it, there are several options: if it is a bug, it might be fixed in nxlog-ee, but I don’t know. Another option is a work-around.
I made the following workaround:
I added a schedule block in the output that reconnects regularly. My main motivation was to allow load balancing to work, though.
I added a line in /etc/crontab, something like the following: 15 1 * * * root <path>/systemctl restart graylog-collector-sidecar.service (check the correct path and service name). This would restart the sidecar every day, and when restarting the sidecar, the nxlog is also restarted, and will work OK until the next restart.
Checking the logs on the graylog server I notice a lot of failures (bellow) on the apache error log, I notice those error happened when apache start and graylog is not fully up so I’m delaying the apache start to be the final and seems to be working, I’m monitoring if that was the issue
[Fri Oct 06 09:45:30.529533 2017] [proxy:error] [pid 8561] (111)Connection refused: AH00957: HTTPS: attempt to connect to 10.100.83.17:9000 (phxiograylogp03.internal.mcmcg.com) failed
[Fri Oct 06 09:45:30.529637 2017] [proxy:error] [pid 8561] AH00959: ap_proxy_connect_backend disabling worker for (phxiograylogp03.internal.mcmcg.com) for 60s
[Fri Oct 06 09:45:30.529654 2017] [proxy_http:error] [pid 8561] [client 10.100.83.252:4120] AH01114: HTTP: failed to make connection to backend: phxiograylogp03.internal.mcmcg.com
[Fri Oct 06 09:45:32.138932 2017] [proxy:error] [pid 8562] (111)Connection refused: AH00957: HTTPS: attempt to connect to 10.100.83.17:9000 (phxiograylogp03.internal.mcmcg.com) failed
[Fri Oct 06 09:45:32.139020 2017] [proxy:error] [pid 8562] AH00959: ap_proxy_connect_backend disabling worker for (phxiograylogp03.internal.mcmcg.com) for 60s
[Fri Oct 06 09:45:32.139037 2017] [proxy_http:error] [pid 8562] [client 10.100.83.253:12867] AH01114: HTTP: failed to make connection to backend: phxiograylogp03.internal.mcmcg.com
[Fri Oct 06 09:45:37.191386 2017] [proxy:error] [pid 8561] AH00940: HTTPS: disabled connection for (phxiograylogp03.internal.mcmcg.com