I’m seeing strange security/cert warnings in graylog server logs after upgrading from Graylog 4.4 to 5:
2023-01-24T15:39:40.885+01:00 WARN [ProxiedResource] Unable to call https://graylog.<redacted>.com:9000/api/system/jobs on node <265afac6-d5af-47ae-b107-7f61973c5a05>: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
2023-01-24T15:39:42.852+01:00 WARN [ProxiedResource] Unable to call https://graylog.<redacted>.com:9000/api/system/metrics/multiple on node <265afac6-d5af-47ae-b107-7f61973c5a05>: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
2023-01-24T15:39:42.860+01:00 WARN [ProxiedResource] Unable to call https://graylog.<redacted>.com:9000/api/system/jobs on node <265afac6-d5af-47ae-b107-7f61973c5a05>: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
2023-01-24T15:39:44.848+01:00 WARN [ProxiedResource] Unable to call https://graylog.<redacted>.com:9000/api/system/metrics/multiple on node <265afac6-d5af-47ae-b107-7f61973c5a05>: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
The problem I’m seeing in Graylog which must be connected to the warnings/errors is that my TLS-enabled Beats inputs report that they fail to start, when clicking on “start”:
However, when going to “Search” section of Graylog, I can see that I have plenty of log data coming in, so the inputs are in fact running despite the GUI showing the opposite:
Glad you’ve come to Open Community for help. We have several practitioners here, who, like you, use Graylog as an important tool . Look forward to those responses to your question.
The issue might be your certificates. I’ll start by asking, have you checked the certificates and their trust chain?
It’s possible that the warnings in the server logs might indicate that the Graylog server is unable to find a valid certification path to the target and therefore is unable to call these resources. I’m guessing this because it may be that TLS-enabled Beats inputs are failing to start, even though log data is still coming in and processed by Graylog. This might be due to a problem with the certificate or the certificate chain not being recognized by the updated version of Graylog.
Just a guess.
You may want to check posts like this in the community:
The issue might be your certificates. I’ll start by asking, have you checked the certificates and their trust chain?
I recently renewed my (homegrown) certs for Graylog and had this working just fine until upgrading from Graylog 4.4 to 5.0, so I’m kind of failing to see how this can be a general problem with my certs. Also worth mentioning is that https works just fine when browsing the graylog web interface…
I checked the post you linked to, about adding a trust store path the Java startup options in /etc/default/graylog-server and tried adding the full path to my certificates directory like so:
# Pointing Java to certificate store (added after upgrading Graylog to version 5)
GRAYLOG_SERVER_JAVA_OPTS="-Djavax.net.ssl.trustStore=/etc/graylog/server/pki/"
However, this did not help as it seems that Java wants a single trust store file of sorts (?) and not a directory?
I really don’t understand the necessity of having to bother with this Java trust store thing, all of a sudden, since this was a non-issue when I installed and configured Graylog 4.4
Anyone care to chime in if the Java trust store thing is the real issue here and, not least, how to set that up correctly?
I have tried viewing various docs as linked from the forum posts I’ve been reading, but most of them result in a http 404 error thrown by the go2.graylog.org website, so would really appreciate some up to date guidance from someone who has dealt with this sort of issue before instead of some outdated or incomplete information which is what I most often find is the case with Graylog official documentation.
sudo cp -a /usr/lib/jvm/java-11-openjdk-amd64/lib/security/cacerts /etc/graylog/server/pki/cacerts.jks
Then adding to /etc/default/graylog-server the following:
# Pointing Java to certificate store (added after upgrading Graylog to version 5)
GRAYLOG_SERVER_JAVA_OPTS="-Djavax.net.ssl.trustStore=/etc/graylog/server/pki/cacerts.jks"
And finally restarting the graylog-server service.
Now, why adding Djavax.net.ssl.trustStore directive to that graylog-server file is suddenly a requirement after upgrading to Graylog 5 is beyond me, but I’m just happy to finally have this issue resolved.