Replace string in field with pipeline rule

Hi,
I have a n conundrum on how to write a pipeline rule to omit password from log.
Log line looks something like this:
Mar 19 12:28:36 localhost haproxy[56341]: 77.243.30.178:12273 [19/Mar/2019:12:28:36.317] api.example.com http-api/BACKEND-1 0/0/1/12/13 200 183 - - ---- 709/707/2/1/0 0/0 {||api.example.com} {189299} "GET /api/command?user=someUser&password=myPassWord&cmd=SOMETHING HTTP/1.1" - -

And I want to change it to look something like this:
Mar 19 12:28:36 localhost haproxy[56341]: 77.243.30.178:12273 [19/Mar/2019:12:28:36.317] api.example.com http-api/BACKEND-1 0/0/1/12/13 200 183 - - ---- 709/707/2/1/0 0/0 {||api.example.com} {189299} "GET /api/command?user=someUser&password=*****&cmd=SOMETHING HTTP/1.1" - -

So omitting password is the goal.
My progress so far is that Iā€™ve created a regext that matches password and incorporated it in a rule.

rule "omit passwords"
when
    regex("(?<=password=)[^&]+").matches == true
then

end

Iā€™ve connected it to All messages stream. but not all massages will have password part and will not match, and it seems they get unprocessed with all my attempts at configuring ā€œthenā€.

Prior to pieline, Iā€™ve configured Haproxy extractor and that works fine. So fields that need edit are ā€œhttp_requestā€ and ā€œmessageā€.
Tnx in advance.

There is currently no way to replace text in a string via a regular expression (or otherwise), but you could add it as a feature request on the Graylog github page?

On a side note, passing credentials via query string isā€¦ not so good. If itā€™s an app built in-house, try using headers instead because the problem with credentials in query strings is that the chance of the credentials inadvertantly leaking are larger than Iā€™d personally be comfortable with.

when Graylog 3.0 is used, you would use the regex_replace function.

1 Like

Huh, I checked the documentation for 3.0 (functions list) and didnā€™t see regex_replace in there - but cool that it exists! :smiley:

Additionally, I believe that itā€™s not possible to edit the original ā€œmessageā€ field? Iā€™m guessing this would need to be done at the log source end, some kind of filebeat processor config? (assuming youā€™re using filebeat of course)

I believe that itā€™s not possible to edit the original ā€œmessageā€ field?

you are able to edit ANY field of a message, even remove and add as many as you like. The only that you need to keep in mind is that Graylog needs timestamp, source and message field to store a message ā€¦

Thanks Jan,

Iā€™m a graylog newbie, but I was under the impression that ā€œmessageā€ canā€™t be changed - I think I based this off the fact that ā€œcutā€ isnā€™t an option on the message field when creating an extractor.

Apologies up front to the OP for going off-topic, but Jan could you clarify my understanding?

Letā€™s say I have a message

ColumnA,ColumB,ColumnC
2019-04-10,Frank,Password123

( I believe this is similar to what the OP is mentioning).

So I have my extractor, and after it runs Graylog has 4+ columns;

Timestamp,Message,UserName,Password

But I want to remove password for security reasons - Thatā€™s easy, donā€™t extract it - But itā€™s still in the ā€œmessageā€ field - So you are saying that I can edit the original message field to be

2019-04-10,Frank

?

And if that is the case, in order to save storage as Iā€™ve already extracted all the info I need into separate columns, could I completely remove the ā€œmessageā€ field, or completely blank it out?

If the message that is sent in to Graylog (letā€™s say to a raw TCP socket) is in fact 2019-04-10,Frank,Password123 then when Graylog has ingested it, internally the message is now:

{ "timestamp": "2019-04-10 08:39:00.000Z", "message": "2019-04-10,Frank,Password123", "source": "the-host-that-logged-it"}

(Plus some additional fields such as the gl2_* set that helps Graylog do smart things like being able to show all messages from an input)

Ideally what you would do is set up a Pipeline, not an extractor, something along the lines of this:

rule "parse message and discard password"
when
  true
then
   let res = split(",", to_string($message.message));
   set_field("username", res[1]);
   set_field("message", concat(to_string(res[0]), ","));  
   set_field("message", concat(to_string($message.message), to_string(res[1]));
end

The reason for the repeated concat is that while a regex_replace function exists (which could do it in one line) I havenā€™t found itā€™s usage documentation on the Graylog site.

But, in essence, you can slice and dice the message, then put it back in a different form if you wish. I use this all the time on nginx logs - we send them in JSON format, a pipeline parses them and extracts the fields, then re-assembles a message field to look like a standard common log format so that some external legacy tools we use donā€™t get too upset :slight_smile:

`

Many thanks Ben :smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.