Certificate problem, and maybe other issues

So what going on with your Elasticsearch? Are you detecting problems?

If you are this maybe the reason you have issues.

I should have been more clear. All of the inputs are green and showing as running. The one called Windows Event Logs is a GELF TCP input, and it is not receiving logs, even though it is showing as running.

I know my setup does not need to be exactly like yours, but I just wanted to get your opinion on the fact that for my elasticsearch.yml file, my network.host variable is commented out, so none is set. Also action.auto_create_index does not exist. Also discovery.type variable does not exist. http.port: 9200 is set. Everything else is the same.

In the server.conf file, it says the default elasticsearch host is http://127.0.0.1:9200, although none is explicitly set, so I guess this does match the elasticsearch.yml file, since none is set in either one.

The other variable is elasticsearch_index_prefix = graylog

I could not see differences in the elasticsearch.yml and the server.conf file.

Hello,

My apologies, This post is kind of confusing me.
I assume were discussing the fact that logs on a INPUT not showing now?

If so, what has been done to insure the messages are making it to Graylog Server INPUT?

Just and FYI I use tcpdump on Graylog server to make sure logs are coming in respectively from the devices I have.

If your using default settings I agree you don’t need to adjust network settings, But there are some setting you do need to comment out.

For example:

sudo tee -a /etc/elasticsearch/elasticsearch.yml > /dev/null <<EOT
cluster.name: graylog
action.auto_create_index: false
EOT

You can find that here That goes for any OS,

So… as it stands right now… the only issue that you have is that the GELF input for windows shows that it is running and is not receiving anything. no other issues or unrelated errors in the logs… Correct?

For clarity’s sake, it should likely be a new question/thread. Here are some questions you should answer when posting that new thread:

  • Is it all windows servers pointed at the GELF input or just one?
  • what are you using to ship the logs from the Windows server (sidecar? Beats? Nxlog?)
  • If it’s sidecar, have you looked at the logs - there will be sidecar and beats/nxlog logs to review )post anomalies)
  • have you used tcpdump on the graylog side to listen to the input port? generically: tcpdump -i [your-interface] -nnA dst port [GELF-input-port]
  • Do you have a stream attached to the GELF input port?
  • Does that stream go to a valid Graylog index?

Post as much as you can, text is better than screen shots if possible.

If there are other issues, we can work on them as needed, possibly with individual new posts if needed.

1 Like

So… as it stands right now… the only issue that you have is that the GELF input for windows shows that it is running and is not receiving anything. no other issues or unrelated errors in the logs… Correct?

Yes and no. The GELF input for Windows Events is not receiving logs now. But before it went down, the original problem I was trying to solve is that logs were not being received from one of the vm’s on which graylog sidecar was running in the network. So I am assuming that when we get this input to again receive logs from all of the other vm’s it was originally receiving logs from, then we will still need to fix the issue with the 1 vm.

  • Is it all windows servers pointed at the GELF input or just one?

All of the windows servers are pointed at the one GELF input.

what are you using to ship the logs from the Windows server (sidecar? Beats? Nxlog?

sidecar and nxlog are both services running on each of the servers reporting logs

  • If it’s sidecar, have you looked at the logs - there will be sidecar and beats/nxlog logs to review )post anomalies)

On the first domain controller, the sidecar logs the last few hundred messages look like this:

time=“2022-03-21T22:39:52-04:00” level=error msg=“[UpdateRegistration] Failed to report collector status to server: Put https://:9000/api/sidecars/b625a55f-01dc-4069-b1f6-0028547e3c36: net/http: TLS handshake timeout”

[post not yet complete]

When I did that I got “No such device exists”

I do not know, but I will try to figure that out.

Ugg! Certificates… why do they have to be so difficult.

Can you post your first domain controller sidecar.yml (properly obfuscated) this is what defines your connection between the windows server and Graylog before the configuration is pushed to define what and how to log.

Also post the GELF Input certificate settings (if you have before I missed and best to have all in one place). Assuming other certificates in other inputs etc. are working correctly (?) but we have to look at both sides of the connection between windows and Input.

Side note:

EDITED: For future clarity, either make up a server name or put in so we know the difference between missing and obfuscated… So it would look like this …https://<servername>:9000/api/sidecars/b6… I think you just had to do the preformatted text </> thing…

For comparison here is my non-certificate sidecar.yml… not hard to tell where the certificate stuff gets changed.

server_url: http://<servername>:9000/api/
server_api_token: "<token mess>" 
update_interval: 10
tls_skip_verify: true
send_status: true
list_log_files:
collector_id: file:C:\Program Files\Graylog\sidecar\collector-id
cache_path: C:\Program Files\Graylog\sidecar\cache
log_path: C:\Program Files\Graylog\sidecar\logs
log_rotation_time: 86400
log_max_age: 604800
collector_binaries_accesslist: []
backends:
    - name: nxlog
      enabled: false
      binary_path: C:\Program Files (x86)\nxlog\nxlog.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\nxlog.conf
    - name: winlogbeat
      enabled: true
      binary_path: C:\Program Files\Graylog\sidecar\winlogbeat.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\winlogbeat.yml
    - name: filebeat
      enabled: true
      binary_path: C:\Program Files\Graylog\sidecar\filebeat.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\filebeat.yml
    - name: auditbeat
      enabled: false
      binary_path: C:\Program Files\Graylog\sidecar\auditbeat.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\auditbeat.yml

Inputs get streams attached to them and the stream(s) defines which index(s) the data is getting stored in. All that isn’t relevant for the machine having TLS handshake issues.

EDIT: Just thinking… Usually certificates base off a FQDN, if you are using IP’s, that may trip up the certificate…

1 Like
# The URL to the Graylog server API.
# Default: "http://127.0.0.1:9000/api/"
server_url: "https://<graylogserverIP>:9000/api"

# The API token to use to authenticate against the Graylog server API.
# Default: none
server_api_token: "<serverAPItoken>"

# The node ID of the sidecar. This can be a path to a file or an ID string.
# If set to a file and the file doesn't exist, the sidecar will generate an
# unique ID and writes it to the configured path.
#
# Example file path: "file:C:\\Program Files\\Graylog\\sidecar\\node-id"
# Example ID string: "6033137e-d56b-47fc-9762-cd699c11a5a9"
#
# ATTENTION: Every sidecar instance needs a unique ID!
#
# Default: "file:C:\\Program Files\\Graylog\\sidecar\\node-id"
node_id: "file:C:\\Program Files\\Graylog\\sidecar\\node-id"

# The node name of the sidecar. If this is empty, the sidecar will use the
# hostname of the host it is running on.
# Default: ""
node_name: "<name_of_first_domain_contrlr>"

# The update interval in secods. This configures how often the sidecar will
# contact the Graylog server for keep-alive and configuration update requests.
# Default: 10
update_interval: 10

# This configures if the sidecar should skip the verification of TLS connections.
# Default: false
tls_skip_verify: false

# This enables/disables the transmission of detailed sidecar information like
# collector statues, metrics and log file lists. It can be disabled to reduce
# load on the Graylog server if needed. (disables some features in the server UI)
# Default: true
send_status: true

# A list of directories to scan for log files. The sidecar will scan each
# directory for log files and submits them to the server on each update.
#
# Example:
#     list_log_files:
#       - "/var/log/nginx"
#       - "/opt/app/logs"
#
# Default: empty list
#list_log_files: []

# Directory where the sidecar stores internal data.
#cache_path: "C:\\Program Files\\Graylog\\sidecar\\cache"

# Directory where the sidecar stores logs for collectors and the sidecar itself.
#log_path: "C:\\Program Files\\Graylog\\sidecar\\logs"

# The maximum size of the log file before it gets rotated.
#log_rotate_max_file_size: "10MiB"

# The maximum number of old log files to retain.
#log_rotate_keep_files: 10

# Directory where the sidecar generates configurations for collectors.
#collector_configuration_directory: "C:\\Program Files\\Graylog\\sidecar\\generated"

# A list of binaries which are allowed to be executed by the Sidecar. An empty list disables the whitelist feature.
# Wildcards can be used, for a full pattern description see https://golang.org/pkg/path/filepath/#Match
# Example:
#     collector_binaries_whitelist:
#       - "C:\\Program Files\\Graylog\\sidecar\\winlogbeat.exe"
#       - "C:\\Program Files\\Filebeat\\filebeat.exe"
#
# Example disable whitelisting:
#     collector_binaries_whitelist: []
#
# Default:
# collector_binaries_whitelist:
#  - "C:\\Program Files\\Graylog\\sidecar\\filebeat.exe"
#  - "C:\\Program Files\\Graylog\\sidecar\\winlogbeat.exe"
#  - "C:\\Program Files\\Filebeat\\filebeat.exe"
#  - "C:\\Program Files\\Packetbeat\\packetbeat.exe"
#  - "C:\\Program Files\\Metricbeat\\metricbeat.exe"
#  - "C:\\Program Files\\Heartbeat\\heartbeat.exe"
#  - "C:\\Program Files\\Auditbeat\\auditbeat.exe"
#  - "C:\\Program Files (x86)\\nxlog\\nxlog.exe"

Where do I find those please?

Edit the GELF Input you are using - all the CERTs settings should be in there.

Also - you are noting "https://<graylogserverIP>:9000/api" … I mentioned it in the end of my last post - most certificates are set up with FQDN rather than IP… something to check.

This post has a lot of the configurations you would be using though his issue is the private key… comparing your settings to his may help.

1 Like

GELF Input Settings:
(global is not checked)
Node: / <graylog_server_FQDN>
Title: Windows Event Logs
Bind address: 0.0.0.0
Port: 12201
Receive Buffer Size: 1048576
No of worker threads: 4
TLS cert file: blank
TLS private key file: blank
Enable TLS: unchecked
TLS key password: blank
TLS client authentication: disabled
TLS Client Auth Trusted Certs: blank
TCP keepalive: unchecked
Null frame delimiter? checked
Maximum message size: 2097152
Override source: blank
Decompressed size limit: 8388608

I can confirm that the common name field in the cert is the FQDN, which is different than the sidecar.yml file, which uses the IP.

Unless the IP is in the certificate’s SAN list, I think that’s a problem…

I also think that if you are using https in your sidecar.yml that you need to make sure the input is set up for TLS.

I am NOT certain about certs though… :smiley:

@fffhurst @fffhurst

Something seams off about the following statements.

If all the Windows servers are pointing to GELF TCP Input.

And that GELF input is configured as shown below.

That configuration shows me that only this device graylog_server_FQDN is bound to that GELF Input.

If you created GELF input for TCP/TLS then it should look like this for all the windows devices within the LAN.

Not sure how the other windows devices are send messages.

Also, the certificates should also be on the remote device and log shipper should also be configured like so. This is an example of Nxlog config file.

<Extension gelf>
    Module      xm_gelf
 </Extension>
<Input zone-01>
    Module      im_msvistalog
    Query <QueryList>\
    <Query Id="0">\
    <Select Path="Application">*</Select>\
    <Select Path="System">*</Select>\
    <Select Path="Security">*</Select>\
    </Query>\
    </QueryList>  
</Input>

<Output out>
    Module      om_ssl 
    Host        graylog.domain.com
    Port        51412
    OutputType  GELF_TCP 
    CertFile    %CERTDIR%/graylog3-certificate.pem
    CertKeyFile %CERTDIR%/graylog3-key.pem
    CAFile      %CERTDIR%/cert3.pem
    KeyPass     secret 
    AllowUntrusted  true   
    Exec $Hostname = hostname_fqdn();
    Exec $FullMessage = $raw_event;
    #Exec        to_syslog_snare();
</Output>

<Route >
    Path        zone-01 => out
</Route>

This is on Graylog Sidecar section /w ( NXLOG)

1 Like

The IP is in the certificate’s SAN list.

The interesting thing is that the 2nd domain controller has the same exact sidecar.yml file and was receiving logs just fine up until Feb. Now I cannot even use the input to search for the log messages that were received for Feb and before, i.e no log messages are able to be seen by searching in the input for log messages, so I am inclined to think that this one problem has something to do with the ability to search.

Thanks I will check all of those things.

There is a new problem: The graylog web admin page went down again. We rebooted, and the web admin page is still down. I am obfuscating my logs then I will post a representative sample from before and after the reboot shortly.

I have already increased the elasticsearch heap size. Can you tell me where to increase the graylog heap size in a CentOS7 graylog install? Have searched the documentation and tried the places mentioned in the video in the graylog documentation, to no avail. Apparently the graylog heap size and the elasticsearch heap size are configured in different places. Thanks.

Default file location are here

Depending on the OS look under Graylog file location and locate the java Setting section.

Please post new questions separately rather than continuing this thread. Each new question gets an opportunity to be viewed and answered by the entire community that way. :slight_smile:

1 Like

Thank you I will do so.