Graylog ingesting Crowdstrike FDR Logs (refined repost)

I spent days searching for a solution to the above. Graylog’s AWS plugin doesn’t work in this case unless you have your own bucket that FDR is dumping into, and Filebeat can’t read the input (likely because the data is stored in gz). So for those that want an actual solution that doesn’t involve “Just spend thousands per month on Splunk!”, here it is:

  1. Use Logstash with the s3 plugin. Example conf.d/fdr.conf:
input {
  s3 {
    access_key_id => "AKblahblahblahblah"
    secret_access_key => "ThisIsNotTheSecretAccessKeyYouAreLookingFor"
    bucket => "CrowdstrikeWillSellYouThis"
    region => "us-some-region"
    additional_settings => {
    force_path_style => false
    follow_redirects => false
    }
  }
}

filter {
  json {
    source => "message"
  }
}

output {
  gelf {
    host => "GraylogIPorHostname"
    port => PortNumber
    sender => "FDR"
  }
}

Also: Default FDR settings (no filters) will generate at least 5GB/day by itself, flooding Graylog with data every 5 minutes.

In my other post, I indicated GELF output wasn’t required, but after trying out TCP, UDP, and Syslog, apparently it is. Also, due to the nature of gelf and its interaction with Graylog, a separate “full_message” field will be generated, which will double the size of the input by replicating the “message” (these messages are LONG). I have found no way of suppressing or deleting the full_message field. Not in Logstash via a Filter, nor in Graylog via Pipelines, but I at least was able to use a regex replacement Extractor to replace a full-match (.+) with a single word (blah) to trim the size. Not ideal, given that the system has to read and match that very expensive pattern, but it might result in less storage space.

2 Likes

In the pipeline, couldn’t you do this:

set_field("full_message", "blah");

No need to regex…

(untested)

That’s a good idea. I was in the process of smashing my head against a wall when I found success with the following config:

input {
  s3 {
    access_key_id => "AKblahblahblahblah"
    secret_access_key => "ThisIsNotTheSecretAccessKeyYouAreLookingFor"
    bucket => "CrowdstrikeWillSellYouThis"
    region => "us-some-region"
    additional_settings => {
    force_path_style => false
    follow_redirects => false
    }
  }
}

filter {
  json {
    source => "message"
  }
  truncate {
    fields => "FeatureVector"
    length_bytes => 50
  }
  mutate {
    rename => {"id" => "ID"}
  }
}

output {
  gelf {
    host => "GraylogIPorHostname"
    port => PortNumber
    sender => "FDR"
    full_message => "full"
  }
}

Changes:

  1. output: specify anything as “full_message” to essentially wipe out the value before the message hits Graylog.
  2. filters: While this originally worked, I discovered an “undocumented feature” for the gelf output. Apparently a json field with the exact name “id” will not be extrapolated, regardless of its value. The actual value of “id” has meaning for the source information, so I needed it to be processed. The mutate filter resolves that by capitalizing the field name. Graylog then picked up the extrapolated field along with all of the other fields.
  3. When tailing Graylog’s log, I noticed some log vomit:

message [ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field=“FeatureVector” (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.

Which eventually led me to a simple Truncate filter. This field’s data is extraneous, so a basic length trim resolved that issue.

2 Likes

Thanks for posting your solution!!