Winlogbeat logs causes Graylog to not process logs

This issue corresponds to only one of the ~50 or so machines running the exact same Winlogbeat configuration via graylog-sidecar.

When I launch the configuration on the target host. My journal gets clogged up with logs - not because it’s a huge amount of them, the server was easily able to process even an influx of 100000 messages from certain machines. The thing is that when the logs from this particular machine come in, suddenly Graylog is unable to process them and journal is stuck at ~2000 messages in and 0 out, I then have to stop the sidecar and wait for my server to unclog for a few minutes.

The issue is obvious from the server.log

[488]: index [graylog_109], id [2a8f9a52-6f38-11ee-82e2-506b8dddb757], message [OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]]]
[489]: index [graylog_109], id [2a8f9a55-6f38-11ee-82e2-506b8dddb757], message [OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]]]
[490]: index [graylog_109], id [2a8f9a56-6f38-11ee-82e2-506b8dddb757], message [OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]]]
[491]: index [graylog_109], id [2a8f9a59-6f38-11ee-82e2-506b8dddb757], message [OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]]]
[492]: index [graylog_109], id [2a8f9a57-6f38-11ee-82e2-506b8dddb757], message [OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]]]
[493]: index [graylog_109], id [2a8f9a58-6f38-11ee-82e2-506b8dddb757], message [OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]]]
[494]: index [graylog_109], id [2a8f9a5a-6f38-11ee-82e2-506b8dddb757], message [OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]]]
[495]: index [graylog_109], id [2a8f9a5b-6f38-11ee-82e2-506b8dddb757], message [OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]]]
[496]: index [graylog_109], id [2a8f9a5c-6f38-11ee-82e2-506b8dddb757], message [OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]]]
[497]: index [graylog_109], id [2a8f9a5d-6f38-11ee-82e2-506b8dddb757], message [OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]]]
[498]: index [graylog_109], id [2a8f9a5e-6f38-11ee-82e2-506b8dddb757], message [OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]]]
[499]: index [graylog_109], id [2a8f9a60-6f38-11ee-82e2-506b8dddb757], message [OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]]]

For whatever reason the messages get spliced to an absurd amount of fields which opensearch cannot handle.

Fine, but here’s when an older version of the server comes in. I should now mention that I have recently upgraded from 4.2.13 to 5.0.8 and the 2 servers run on 2 different machines, 99% of logs are coming to the new one - I am in the process of migrating.

I have tried to reproduce the same situation on the older server. Same machine, same Winlogeat config, same Beats config, same port.
And to my surprise the logs on the older server came in just fine, without any hiccups. Of course I kept the server configuration identical on both these servers.
So the only logical conclusion is that it is OpenSearch’s fault, since the older server is running Elasticsearch. And yes I also kept the Opensearch config pretty much identical to Elasticsearch.

I’ve also looked at the logs that Winlogbeat is sending from the machine, I mean I looked at the raw file that sidecar is keeping on the machine before sending them to Graylog but nothing out of the ordinary there.
I hope you can help me.

Relevant configs:

Winlogbeat:

# Needed for Graylog
fields_under_root: true
fields.collector_node_id: ${sidecar.nodeName}
fields.gl2_source_collector: ${sidecar.nodeId}
logging.level: debug
logging.selectors: [eventlog]

output.logstash:
   hosts: ["x.x.x.x:5044"]
path:
  data: ${sidecar.spoolDir!"C:\\Program Files\\Graylog\\sidecar\\cache\\winlogbeat"}\data
  logs: ${sidecar.spoolDir!"C:\\Program Files\\Graylog\\sidecar"}\logs
tags:
 - windows
winlogbeat:
  event_logs:
   - name: Application
     ignore_older: 1h
   - name: System
     ignore_older: 1h
   - name: Security
     ignore_older: 1h
   - name: DFS Replication
     ignore_older: 1h
   - name: Directory Service
     ignore_older: 1h
   - name: DNS Server
     ignore_older: 1h
   - name: File Replication Service
     ignore_older: 1h

Beats:

bind_address: 0.0.0.0
charset_name: <empty>
no_beats_prefix: false
number_worker_threads: 8
override_source: <empty>
port: 5044
recv_buffer_size: 1048576
tcp_keepalive: false
tls_cert_file: <empty>
tls_client_auth: disabled
tls_client_auth_cert_file: <empty>
tls_enable: false
tls_key_file: <empty>
tls_key_password:********

Elastisearch:

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 127.0.0.1

Opensearch:

cluster.name: graylog
node.name: ${HOSTNAME}
path.data: /var/lib/opensearch
path.logs: /var/log/opensearch
network.host: 0.0.0.0
discovery.type: single-node
action.auto_create_index: false
plugins.security.disabled: true
plugins.security.ssl.transport.pemcert_filepath: node.pem
plugins.security.ssl.transport.pemkey_filepath: node-key.pem
plugins.security.ssl.transport.pemtrustedcas_filepath: root-ca.pem
plugins.security.ssl.transport.enforce_hostname_verification: false
plugins.security.ssl.http.enabled: true
plugins.security.ssl.http.pemcert_filepath: node.pem
plugins.security.ssl.http.pemkey_filepath: node-key.pem
plugins.security.ssl.http.pemtrustedcas_filepath: root-ca.pem
plugins.security.allow_unsafe_democertificates: true
plugins.security.allow_default_init_securityindex: true
plugins.security.audit.type: internal_opensearch
plugins.security.enable_snapshot_restore_privilege: true
plugins.security.check_snapshot_restore_write_privileges: true
plugins.security.restapi.roles_enabled: ["all_access", "security_rest_api_access"]
plugins.security.system_indices.enabled: true
plugins.security.system_indices.indices: [".plugins-ml-model", ".plugins-ml-task", ".opendistro-alerting-config", ".opendistro-alerting-alert*", ".opendistro-anomaly-results*", ".opendistro-anomaly-detector*", ".opendistro-anomaly-checkpoints", ".opendistro-anomaly-detection-state", ".opendistro-reports-*", ".opensearch-notifications-*", ".opensearch-notebooks", ".opensearch-observability", ".ql-datasources", ".opendistro-asynchronous-search-response*", ".replication-metadata-store", ".opensearch-knn-models"]
node.max_local_storage_nodes: 3

Hey @cesq

This error can be solve by slitting it up into another index set, reducing the amount of logs/messages needed OR fine tuning your Winlogbeat file (i.e, Send only what you need to Graylog) Opensearch is yelling that you have to many fields. This setting is there so you dont blow up your indexer.

Here is a good demo/idea of tuning your Winlogbeat file.

Example drop a message or filter out with EventID’s. Just an idea.

winlogbeat:
  event_logs:
    - name: Security
      processors:
         - drop_event:
                 when:
                  contains:
                      event_data.TargetUserName: sa-network-adm
    - name: System 
      event_id: 5827, 5828, 5829, 5830, 4625

EDIT: I forgot to mention when your index retention rotates any field not seen in that time will get dropped off. just an FYI.

Hope that helps.

Thanks for your reply, The server in question is a File Server so it does generate a proper amount of logs, however I already have another File Server that generates just as much messages and works fine.

What might be important. The Beats Input had an enforced UTF-8 Character set.
I’ve arleady disabled it in hopes that this was the issue.
Then after the index rotated, I added the machine again and it worked.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.