Reduce amount of duplicated metadata for logs collected from Filebeat

We are using Filebeat to collect logs and I cannot by notice that we have a lot of unnecessary information collected with each log from Filebeat. Moreover many pieces of that information is duplicated within a single log entry.

The following meta fields provide the same information:

  • “filebeat” - beats_type, filebeat_@metadata_beat, filebeat_agent_type
  • version - filebeat_@metadata_version, filebeat_version_agent_version
  • timestamp - filebeat_@timestamp, timestamp
  • hostname - filebeat_agent_hostname, filebeat_host_name

It looks like we could cut down our storage by 25-40% if we just drop these duplicated meta fields.

I’ve learn that I should be ale to drop fields directly in filebeat via

processors:  
    - drop_fields:
         fields: ["filebeat_log_offset", "filebeat_input_type", "filebeat_ecs_version", "filebeat_agent_version", "filebeat_agent_type", "filebeat_agent_name"]

but that doesn’t seem to work.

I assume some of the fields are sent from filebeat and some are added by Graylog based on that is sent from filebeat and the os how duplication is created.

I’ve solved it by using pipeline and rule

rule "Remove redundant meta fields"
when
  has_field("filebeat_log_offset")
then
  remove_field("filebeat_log_offset");
  remove_field("filebeat_input_type");
  remove_field("filebeat_ecs_version");
  remove_field("filebeat_agent_version");
  remove_field("filebeat_agent_type");
  remove_field("filebeat_agent_name");
  remove_field("filebeat_agent_hostname");
  remove_field("filebeat_@metadata_beat");
  remove_field("filebeat_@metadata_version");
  remove_field("filebeat_@timestamp");
  remove_field("filebeat_@metadata_type");
end

But I still would love to know if it’s possible to configure filebeat to not send all data at all.

1 Like

I use a pipeline rule to delete the fields I don’t want. Here’s mine for example:

rule "Remove Redundant Fields-Beats"
when 
    has_field("beats_type") and contains(to_string($message.beats_type),"filebeat")
then
    remove_field("filebeat_@metadata_beat",$message);
    remove_field("filebeat_@metadata_type",$message);
    remove_field("filebeat_agent_ephemeral_id",$message);
    remove_field("filebeat_agent_id",$message);
    remove_field("filebeat_host_architecture",$message);
    remove_field("filebeat_host_containerized",$message);
    remove_field("filebeat_host_hostname",$message);
    remove_field("filebeat_host_id",$message);
    remove_field("filebeat_host_ip",$message);
    remove_field("filebeat_host_mac",$message);
    remove_field("filebeat_host_name",$message);
    remove_field("filebeat_host_os_codename",$message);
    remove_field("filebeat_host_os_family",$message);
    remove_field("filebeat_host_os_kernel",$message);
    remove_field("filebeat_host_os_name",$message);
    remove_field("filebeat_host_os_platform",$message);
    remove_field("filebeat_host_os_type",$message);
    remove_field("filebeat_host_os_version",$message);
    remove_field("filebeat_input_type",$message);
    remove_field("filebeat_log_offset",$message);
end
1 Like

hahaha, same time!! Glad you solved it!!!