Replace sensitive information

Hello!

I am trying to create a Pipeline rule that replaces sensitive information. Im running into issues with using '[^&]*.

For example i am trying to filter out the following password: ‘#p54_L35’. It works well if i replace '[^&]* with ‘#p54_L35’. So what am i doing wrong here? Is there an alternative way of filtering out sensitive information?

See code below

rule “Hide sensitive information”
when
has_field(“message”)
then
let message = to_string($message.message);
let filteredMessage = replace(message, “password: '[^&]'", “password: '[redacted]'”, -1);
let filteredMessage2 = replace(filteredMessage, "token: ''[^&]
'”, “password: '[redacted]'”, -1);
let filteredMessage3 = replace(filteredMessage2, “token=[^&]*”, “token=[redacted]”, -1);
set_field(“message”, filteredMessage3);
end

Bump!

I still havent been able to figure this out. Any kind soul out there?

could you post one original message, with a faked password?

Of course!

Below is a message with fake tokens.

2023-05-08T13:29:20.315Z - e[32minfoe[39m: POST /api/company/basic?token=5a0465asdas987987fds987822ed2ff1e6a0505fba705e879b2a&customerId=15429 method=POST, originalUrl=/api/company/basic?token=5a0465asdas987987fds987822ed2ff1e6a0505fba705e879b2a&customerId=15429, ip=::ffff:10.421.1.123, correlationId=51232349ea-1f234285-446f-923435f-edc5b994eded, token=5a0465asdas987987fds987822ed2ff1e6a0505fba705e879b2a, customerId=9999, q=[{“a”:“orgNumber”,“c”:“eq”,“v”:10001090218}], country=SE, f=[name, cfar, orgNumber, prospectingId], finished=true, time=5, statusCode=200

This recent blog post on this topic may be of interest:

those are the Steps I recommend:

  1. make your log machine readable in a rule in a pipeline. Parse it into the correct fields.
  2. redact all the fields you want to hide in rules
  3. delete the message by replacing it with some bogus-content. The message field is mandatory in graylog - you can not delete it.

in detail:

  1. parsing:
    Create the following Grok Patterns:
    name: ApacheLogs
%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:day}T%{TIME:time}Z - e\[32minfoe\[%{INT:response_time}m: %{DATA_ALL_BUT_SPACE:http_request_method} %{DATA_ALL_BUT_SPACE:path} method=%{DATA_ALL_BUT_SPACE:http_request_method} originalUrl=%{DATA_ALL_BUT_SPACE:http_originalUrl} ip=%{DATA_ALL_BUT_SPACE:source_ip} correlationId=%{DATA_ALL_BUT_SPACE:correlationId}

and this one names as “DATA_ALL_BUT_SPACE”:

[^ ]+

The first one will parse the first part of our log, the rest you can build on your own.

  1. create a rule in a pipeline attached to your logs:
rule "parsing: apache Logs"
when
  true // or better condition if you have
then
  set_fields(
    grok(
      pattern:"^%{ApacheLogs}",
      value:to_string($message.message),
      only_named_captures:true
    )
  );
end

This pipeline will set the stuff you need in different fields. There is the pattern for DATA_ALL_BUT_SPACE, you can create a similar one to make the “,” not beeing captured.

  1. create rules redacting the fields you don’t want to have with full values:
rule "redacting"
when
  has_field("secret_values")
then
  set_field(
    field:"secret_values", 
    value:
        abbreviate(
            value:sha256(
                to_string($message.secret_values)
            ),
            width:to_long("10")
        )
    );
end

This will replace the field secret_values with the first 10 signs of the sha256 hash of the value of that field.

and also redacting the full-message:

rule "redacting"
when
  has_field("message")
then
  set_field("message", "bogus-content");
end
1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.