Running key=value tokenizer extractor from Pipelines?

Hello Graylog community,
I have a simple question today: Can pipelines do what “key=value” tokenizer extractor does?

I know the pipelines have the key_value() function, but while it is more configurable than its key=value tokenizer extractor counterpart, there seems to be no way to handle spaces in quoted fields properly :slightly_frowning_face:

What am I missing? (Sorry for such broad question) ?


As an exercise I’ve switched a test copy of my input from key=value tokenizer extractor to key_value() pipeline function, details can be found at RAW Input with “Length-prefixed framing” - #6 by nisow95612

These are the results of that exercise (identifying information masked):

  • 1st line is result of that exercise - dstcountry=“Czech Republic” was not parsed correctly,
  • 2nd line is what Syslog TCP does by default - worked OK, because there was no embedded “=”,
  • 3rd line is parsed by key=value tokenizer extractor - worked perfectly. Unfortunately this works only on logs forwarded by FMGR, as explained at RAW Input with “Length-prefixed framing”.

I believe the Graylog servers are fully up to date, so 4.1.

Hello,
Out of curiosity, how did you setup your Message Processors Configuration?

Funny, it’s the same as yours except GeoIP Resolver is disabled.

This means I can’t do this “bogus field killing” in pipeline and then let extractors do k=v tokenizing, right?

Not sure, I’m still looking into it.

1 Like

Hi @nisow95612

This is know issue with key_value function. You can override it with some little regex. I use this pipeline and works great for me also for Fortigate multi word values with quotes.

rule "Forti-regex-KV"
when
  has_field("message") and contains(to_string($message.message), "devname")
then
  let regex = regex("(date=.*)",to_string($message.message));

  // Replace values with quotes to format "key":"value"
  let replace1 = regex_replace("([a-z0-9\\_\\-]+)=(?:\")([^\"]+)(?:\")", to_string(regex["0"]), "\"$1\":\"$2\",");

  // Replace values without quotes to format "key":"value"
  let replace2 = regex_replace("([a-z0-9\\_\\-]+)=([0-9.\\-:]+|N/A)(?: |$)", replace1, "\"$1\":\"$2\",");

  // Replace start end with {} to json format syntax
  //let replace3 = regex_replace("(.*),", replace2, "{$1}");

  set_fields(
    fields:
    key_value(
    value: to_string(replace2),
    trim_value_chars: "\"",
    trim_key_chars:"\"",
    delimiters:",",
    kv_delimiters:":"
    )
  );
end
1 Like

Hi @shoothub,
thanks for the tip. That may be viable a workaround, even if imperfect. I assume you don’t use IPv6 yet? I also think hackers can put as many colons and commas into VPN hostnames/usernames as they want.

Is the first regex replacement correct? I think \"$2\" would refer to (?:\").
Oh wait… there is “(?”. Does that somehow prevent () from being a capture group?

Wouldn’t it be easier to start with Fortigate’s set format csv? It looks like this:
type="event",subtype="vpn",level="notice",vd="Proxxxxxxxxxxx",logdesc="IPsec connection status changed",msg="IPsec connection status change",action="tunnel-up"

Hi @nisow95612
you are correct, (?: is non-capturing group

Maybe an interesting option, i haven’t tried yet. Please provide your experiences if yout try it.

Hi @shoothub,

How does your parser handle field “msg” in “perf-stats” messages?

I don’t see any special handling of embedded commas or colons in your regexes and msg has both.

Message example (CSV, but please imagine spaces instead of commas):
date=2021-09-09,time=13:17:xx,devname="Pexxxxxxx_Master",devid="FGxxxxxxxxxxxxxxx",eventtime=16311862xxxxxxxxxxx,tz="+0200",logid="0100040704",type="event",subtype="system",level="notice",vd="Proxxxxxxxx",logdesc="System performance statistics",action="perf-stats",cpu=12,mem=12,totalsession=123,disk=123,bandwidth="123/123",setuprate=123,disklograte=123,fazlograte=123,freediskstorage=123,sysuptime=123,waninfo="xxxxx",msg="Performance statistics: average CPU: 12, memory: 12, concurrent sessions: 123, setup-rate: 123"

From that msg should parse as
Performance statistics: average CPU: 12, memory: 12, concurrent sessions: 123, setup-rate: 123


I have tried to parse the CSV format with let k_v = key_value(value:msgtext, delimiters:",", kv_delimiters:"=", trim_value_chars:"\"");.

As always, it mostly works, but msg gets parsed as Performance statistics: average CPU: 7

If you use CSV format, try to use this little fix. It will replace , delimeter with | which not collide in performace statistics, then use | as delimeter in key_value function.

  // replace delimeter , with |
  let replace1 = regex_replace(",(\\w+=)", to_string($message.msgtext), "|$1");
  set_fields(
  // use | delimeter in key_value
  fields:
  key_value(
    value: replace1,
    trim_value_chars: "\"",
    delimiters:"|",
    kv_delimiters:"="
  )
);