Sidecar not starting wrappers

We seem to have an issue with both the latest version of sidecar and the original (1.1.0 and 1.0.1).

The sidecar wrappers are not starting automatically after a client reboot. The clients are Windows 10 21H2. When you look in services after a client boots the Graylog Sidecar service is running but all the beats wrappers are set to Manual but haven’t been started by the Sidecar service. The Graylog sidecar management interface shows all the wrappers as green arrows (running).

In the test which results in the below log the machine was booted at around 9:40 and a user logs in. There have been no changes to the configuration files for Beats in the Sidecar configuration. At 9:50 I triggered a manual restart of the Metricbeat configuration process through the Graylog interface.

The resulting logs are:

time="2022-09-16T09:40:21+01:00" level=info msg="Starting signal distributor" 
time="2022-09-16T09:40:32+01:00" level=info msg="Adding process runner for: filebeat" 
time="2022-09-16T09:40:32+01:00" level=info msg="Adding process runner for: metricbeat" 
time="2022-09-16T09:40:32+01:00" level=info msg="Adding process runner for: winlogbeat_client" 
time="2022-09-16T09:40:32+01:00" level=info msg="[winlogbeat_client] Configuration change detected, rewriting configuration file." 
time="2022-09-16T09:40:32+01:00" level=info msg="[filebeat] Configuration change detected, rewriting configuration file." 
time="2022-09-16T09:40:32+01:00" level=info msg="[metricbeat] Configuration change detected, rewriting configuration file." 
time="2022-09-16T09:45:36+01:00" level=info msg="Starting signal distributor" 
time="2022-09-16T09:45:46+01:00" level=info msg="Adding process runner for: metricbeat" 
time="2022-09-16T09:45:46+01:00" level=info msg="Adding process runner for: winlogbeat_client" 
time="2022-09-16T09:45:46+01:00" level=info msg="Adding process runner for: filebeat" 
time="2022-09-16T09:45:46+01:00" level=info msg="[metricbeat] Configuration change detected, rewriting configuration file." 
time="2022-09-16T09:45:46+01:00" level=info msg="[winlogbeat_client] Configuration change detected, rewriting configuration file." 
time="2022-09-16T09:45:46+01:00" level=info msg="[filebeat] Configuration change detected, rewriting configuration file." 
time="2022-09-16T09:49:31+01:00" level=info msg="Stopping signal distributor" 
time="2022-09-16T09:49:31+01:00" level=info msg="[metricbeat] Stopping" 
time="2022-09-16T09:49:31+01:00" level=info msg="[filebeat] Stopping" 
time="2022-09-16T09:49:31+01:00" level=info msg="[winlogbeat_client] Stopping" 
time="2022-09-16T09:49:31+01:00" level=error msg="[metricbeat] Could not send stop control: The service has not been started." 
time="2022-09-16T09:49:31+01:00" level=error msg="[filebeat] Could not send stop control: The service has not been started." 
time="2022-09-16T09:49:31+01:00" level=error msg="[winlogbeat_client] Could not send stop control: The service has not been started." 
time="2022-09-16T09:49:33+01:00" level=info msg="Starting signal distributor" 
time="2022-09-16T09:49:43+01:00" level=info msg="Adding process runner for: metricbeat" 
time="2022-09-16T09:49:43+01:00" level=info msg="Adding process runner for: winlogbeat_client" 
time="2022-09-16T09:49:43+01:00" level=info msg="Adding process runner for: filebeat" 
time="2022-09-16T09:49:44+01:00" level=info msg="[metricbeat] Configuration change detected, rewriting configuration file." 
time="2022-09-16T09:49:44+01:00" level=info msg="[winlogbeat_client] Configuration change detected, rewriting configuration file." 
time="2022-09-16T09:49:44+01:00" level=info msg="[filebeat] Configuration change detected, rewriting configuration file." 
time="2022-09-16T09:50:14+01:00" level=info msg="[metricbeat] Got remote restart command" 
time="2022-09-16T09:50:14+01:00" level=info msg="[metricbeat] Starting (svc driver)" 

Restarting the individual configuration processes through the Graylog interface kicks things into life and logs begin to flow (09:50).

I think it may be to do with TLS, our Graylog instance is HTTPS but our Beats input and Beats agents are not configured for TLS. Could this cause this issue?

Can you post your sidecar.yml for the Win10 machine (Obfuscated and default commenting not needed of course) below are mine on my win10 machine just for comparison (NOTE: yml is case/space sensitive)

You can run C:\Program Files\Graylog\sidecar\graylog-sidecar.exe -debug at the administrative command line to get a view into what its doing - maybe post that as well.

server_url: http://billion-core:9000/api/
server_api_token: "<random-stuff>" 
update_interval: 10
tls_skip_verify: true
send_status: true
list_log_files:
collector_id: file:C:\Program Files\Graylog\sidecar\collector-id
cache_path: C:\Program Files\Graylog\sidecar\cache
log_path: C:\Program Files\Graylog\sidecar\logs
log_rotation_time: 86400
log_max_age: 604800
tags: [windows]
collector_binaries_accesslist: []
backends:
    - name: nxlog
      enabled: false
      binary_path: C:\Program Files (x86)\nxlog\nxlog.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\nxlog.conf
    - name: winlogbeat
      enabled: true
      binary_path: C:\Program Files\Graylog\sidecar\winlogbeat.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\winlogbeat.yml
    - name: filebeat
      enabled: true
      binary_path: C:\Program Files\Graylog\sidecar\filebeat.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\filebeat.yml
    - name: auditbeat
      enabled: false
      binary_path: C:\Program Files\Graylog\sidecar\auditbeat.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\auditbeat.yml

Hello @nick

Adding on to @tmacgbay Post.
By chance does you MetricBeat service look something like this?

image

The services aren’t listed in services.msc until I manually push a restart/start from Graylog. When they appear they are listed as Manual (which I think is correct as they’re controlled by the Graylog Sidecar service). The Graylog Sidecar service is always running and set to Automatic.

The Beats configs are correct, they seem to be running ok on Server 2019 boxes (although we’re just running through checking they are actually running) and when the services are manually restarted the logs appear in Graylog. I think this rules out the Beats configs as being an issue.

In the sidecar log on the clients I can see an attempt to rewrite the config every few minutes and a lot of errors about Sidecar not being able to report status to Graylog - connection actively refused. That would explain the Graylog status page showing the service as running when it’s not (although I’d expect Graylog to show an unknown status for the service at that point). It doesn’t explain why the collector services aren’t being started by the Sidecar service though.

The Beats input is running on port 5044 and is not configured for TLS, the Graylog web interface is configured for TLS.

sidecar debug shows that the application has detected a config change but the config files don’t seem to have been downloaded.

It also shows “Failed to get registered services, skipping cleanup, access denied”.

I’m wondering whether the graylog-sidecar services requires some permissions its missing to read and set service states?

Hello,
By chance did you execute these commands here on Windows?

"C:\Program Files\graylog\sidecar\graylog-sidecar.exe" -service install
"C:\Program Files\graylog\sidecar\graylog-sidecar.exe" -service start

And if you have errors stating the following…

Sidecar not being able to report status to Graylog -

“Failed to get registered services, skipping cleanup, access denied”

I think the Graylog Sidecar wasn’t install correctly or may have missed some settings/configurations.

Thanks @gsmith

All installed according to the documentation, we have sidecar installed on our server infrastructure as well and that doesn’t seem to have an issue.

As far as I’m aware there are no configurations to miss. We install our custom sidecar.yml (which matches the server ones) after the installation and the sidecar appears in the Graylog interface.

I have now noticed that Graylog fails to push new configs down as well, it definitely looks like a permissions issue to start with. The only change we’ve made since installing sidecar a couple of years ago is the transition from Windows 10 2004 to 21H2, I wonder if the new Windows version broke something silently.

It’s a bit annoying that Graylog is showing all the services as running when they’re not. If a sidecar doesn’t report status surely they should report as Unknown?

Ok, I’m working on this issue again today. For background, our installation process for Graylog sidecar is an xcopy of the Graylog folder and then a powershell command to do the -service install. We then reboot during the build process so starting the service isn’t required (as it starts automatically on reboot). There are no custom permissions set on the Graylog folder structure. The installation is carried out by the local administrator account within Microsoft Deployment Toolkit.

There appear to be two related issues found when running the debug tool. One is Access Denied for sidecar to get Service statuses and the other is Access Denied when trying to write new log files.

Here are the three errors that are in constant repeat within the debug log:

level=debug msg="[RequestConfiguration] No Configuration update available, skipping update" Failed to fire hook = Can't rename log file: rename C:\Program Files\Graylog\sidecar\logs\sidecar.log C:\Program Files\Graylog\sidecar\logs\sidecar-2022-09-22T09-00-26.650.log

level=warning msg="Failed to get registered services. Skipping Cleanup. Access is denied." Failed to fire hook = Can't rename log file: rename C:\Program Files\Graylog\sidecar\logs\sidecar.log C:\Program Files\Graylog\sidecar\logs\sidecar-2022-09-22T09-00-27.827.log

level=debug msg="[RequestBackendList] No update available" Failed to fire hook = Can't rename log file: rename C:\Program Files\Graylog\sidecar\logs\sidecar.log C:\Program Files\Graylog\sidecar\logs\sidecar-2022-09-22T09-00-28.534.log

Hello

I would not be surprised, dealing with Windows.

level=debug msg="[RequestConfiguration] No Configuration update available, skipping update" Failed to fire hook = Can't rename log file: rename C:\Program Files\Graylog\sidecar\logs\sidecar.log C:\Program Files\Graylog\sidecar\logs\sidecar-2022-09-22T09-00-26.650.log

So permission issue for renaming the log file :thinking:

By chance on Graylog-Sidecar conf file is there any configuration made like this?

# Default:
collector_binaries_accesslist:
#  - "C:\\Program Files\\Graylog\\sidecar\\filebeat.exe"
   - "C:\\Program Files\\Graylog\\sidecar\\winlogbeat.exe"
#  - "C:\\Program Files\\Filebeat\\filebeat.exe"
#  - "C:\\Program Files\\Packetbeat\\packetbeat.exe"
   - "C:\\Program Files\\Metricbeat\\metricbeat.exe"

Check the file/folder privileges… that seems odd that it can’t rotate a log file. You set the service account to start with an account with better privileges’ as a test as well…

Can you post the sidecar.yml that you are using? (Obfuscated)

1 Like