Filebeat sidecar needs to be restarted on Windows reboot (reopen)

Apologies @tmacgbay but had some personal issues I had to attend to and my original thread here was automatically closed so I had to open a new one.

It just happened again and it appears that Graylog is simply refusing the connection. Here are the logs (external link since it’s quite large). It happens usually after the host restarts after updates are installed.

Could you also post “C:\Program Files\Graylog\sidecar\sidecar.yml”? That is the part that manages the underlying connection configuration to the Graylog Server.

For reference in case you want to compare like I will, here is mine that I use for all windows machines:

server_url: http://GRAYLOGSERVER:9000/api/
server_api_token: "<SUPERSECRETCODE>" 
update_interval: 10
tls_skip_verify: true
send_status: true
list_log_files:
collector_id: file:C:\Program Files\Graylog\sidecar\collector-id
cache_path: C:\Program Files\Graylog\sidecar\cache
log_path: C:\Program Files\Graylog\sidecar\logs
log_rotation_time: 86400
log_max_age: 604800
tags: [windows]
collector_binaries_whitelist: []
backends:
    - name: nxlog
      enabled: false
      binary_path: C:\Program Files (x86)\nxlog\nxlog.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\nxlog.conf
    - name: winlogbeat
      enabled: true
      binary_path: C:\Program Files\Graylog\sidecar\winlogbeat.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\winlogbeat.yml
    - name: filebeat
      enabled: true
      binary_path: C:\Program Files\Graylog\sidecar\filebeat.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\filebeat.yml
    - name: auditbeat
      enabled: false
      binary_path: C:\Program Files\Graylog\sidecar\auditbeat.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\auditbeat.yml

Also, in C:\Program Files\Graylog\sidecar\logs there are two logs to tell you about what is going on.

sidecar.log
winlogbeat

I don’t need the whole thing, just the snippets that seem to repeat themselves.

Interestingly enough, my sidecar.yml is different than yours:

    server_url: "http://192.168.139.11:9000/api"
    server_api_token: "[SECRET]"
    node_id: "file:C:\\Program Files\\Graylog\\sidecar\\node-id"
    node_name: ""
    update_interval: 10
    tls_skip_verify: false
    send_status: true

However, I do have this under System/Sidecar in Graylog:

    fields_under_root: true
    fields.collector_node_id: ${sidecar.nodeName}
    fields.gl2_source_collector: ${sidecar.nodeId}

    output.logstash:
       hosts: ["192.168.139.11:5044"]
    path:
      data: C:\Program Files\Graylog\sidecar\cache\filebeat\data
      logs: C:\Program Files\Graylog\sidecar\logs
    tags:
     - veeam
    filebeat:
      inputs:
        - type: log
          enabled: true
          paths: 
            - C:\logs\log.log      
            - C:\ProgramData\Veeam\Backup\Homelab\Job.Homelab.Backup.log

Again, it works fine besides when the Windows machine running the sidecar reboots and connection is lost. The sidecar under graylog never goes into an unhealthy state. It’s always green and operational.

And here’s the logs:

    time="2020-11-22T12:53:49-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: No connection could be made because the target machine actively refused it." 
    time="2020-11-22T12:54:00-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: No connection could be made because the target machine actively refused it." 
    time="2020-11-22T12:54:11-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: No connection could be made because the target machine actively refused it." 
    time="2020-11-22T12:54:22-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: No connection could be made because the target machine actively refused it." 
    time="2020-11-22T12:54:33-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: No connection could be made because the target machine actively refused it." 
    time="2020-11-22T12:54:44-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: No connection could be made because the target machine actively refused it." 
    time="2020-11-22T12:54:55-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: No connection could be made because the target machine actively refused it." 
    time="2020-11-22T12:55:06-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: No connection could be made because the target machine actively refused it." 
    time="2020-11-22T12:55:17-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: No connection could be made because the target machine actively refused it." 
    time="2020-11-22T12:55:28-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: No connection could be made because the target machine actively refused it." 
    time="2020-11-22T12:55:39-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: No connection could be made because the target machine actively refused it." 
    time="2020-11-22T12:55:50-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: No connection could be made because the target machine actively refused it." 
    time="2020-11-22T12:56:01-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: No connection could be made because the target machine actively refused it." 
    time="2020-11-22T12:56:12-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: No connection could be made because the target machine actively refused it." 
    time="2020-11-22T17:51:54-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: read tcp 192.168.139.10:60477->192.168.139.11:9000: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond." 
    time="2020-11-22T17:52:25-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond." 
    time="2020-11-22T17:52:56-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: dial tcp 192.168.139.11:9000: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond." 
    time="2020-11-23T05:20:28-05:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put http://192.168.139.11:9000/api/sidecars/4bc0cd6b-0a7d-4c28-ab6c-7d40462966dd: read tcp 192.168.139.10:58309->192.168.139.11:9000: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond."
node_id: "file:C:\\Program Files\\Graylog\\sidecar\\node-id"
node_name: ""

That seems odd to me - I don’t think you explicitly want to say there is no node name. If it is left undefined sidecar/filebeat/nxlog picks the netBIOS name or similar. If it is trying to register with a blank name, I can see how it would be unhappy. Total guess from the things I can see. There are other things slightly different - not sure how they would react (Like quotes around your server_url:). Unless you have node_id and node_name in there for specific reasons, you can remove them as I don’t think they are needed by default.

Apologies for not replying earlier. I will make adjustments to the areas you pointed out and see if it behaves differently.

Thanks for your help this far.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.