Sorry for the brevity but I wanted to put this out for anyone else that might have this issue. I will add more information later.
Today I logged into our Graylog 2.2.3 cluster and a ton of the Linux servers had a status of failing.
Running Collector Sidecar 0.1.1.
Using Filebeats only.
NXlog is diabled in the conf.
Looking at /var/log/graylog/collector-sidecar/collector_sidecar.log I’m seeing:
time=“2017-06-01T16:51:44-04:00” level=info msg="[filebeat] Starting (exec driver)"
time=“2017-06-01T16:51:45-04:00” level=error msg="[filebeat] Unable to start collector after 3 tries, giving up!"
time=“2017-06-01T16:51:45-04:00” level=info msg="[filebeat] Configuration change detected, rewriting configuration file."
time=“2017-06-01T16:51:45-04:00” level=info msg="[filebeat] Stopping"
time=“2017-06-01T16:51:47-04:00” level=info msg="[filebeat] Starting (exec driver)"
time=“2017-06-01T16:51:48-04:00” level=error msg="[filebeat] Unable to start collector after 3 tries, giving up!"
time=“2017-06-01T16:51:55-04:00” level=error msg="[UpdateRegistration] Failed to report collector status to server: Put https://graylog.company.com/api/plugins/org.graylog.plugins.collector/collectors/5d502cd5-5e28-4acd-a287-2ecd93123710: read tcp 10.10.10.10:47928->10.10.10.11:443: read: connection reset by peer"
time=“2017-06-01T16:51:55-04:00” level=error msg="[RequestConfiguration] Fetching configuration failed: Get https://graylog.company.com/api/plugins/org.graylog.plugins.collector/5e532cd5-5e29-4acd-a287-2ecd93123710?tags=[“linux”%2C"nginx"]: read tcp 10.10.10.10:47930->10.10.10.11:443: read: connection reset by peer"
time=“2017-06-01T16:52:05-04:00” level=error msg="[UpdateRegistration] Bad response from Graylog server: 503 Service Unavailable"
time=“2017-06-01T16:52:05-04:00” level=error msg="[RequestConfiguration] Bad response status from Graylog server: 503 Service Unavailable"
time=“2017-06-01T16:52:05-04:00” level=error msg=“Can’t fetch configuration from Graylog API: invalid character ‘<’ looking for beginning of value”
What I ended up doing to fix it:
mv /var/cache/graylog/collector-sidecar/filebeat/data/registry /var/cache/graylog/collector-sidecar/filebeat/data/registry.old.2017-06-01 && systemctl restart collector-sidecar.service && tail -f /var/log/graylog/collector-sidecar/collector_sidecar.log
Any ideas what caused this?