Problems parsing pipe delimited log file with pipeline rules

Hello,

I am trying to build a pipeline rule to parse some custom log files that are pipe delimited. Trying to do it with grok patterns but the pipes in the file seem to be having the “OR” regular expression effect.

I have log files that looks like this:
2022-01-09 10:17:45|INFO|Processor1|Log data line 15

and am trying to extract them like this:

then
  let fields = grok
  (pattern: "%{DATESTAMP:log_timestamp}|%{WORD:log_level}|%{WORD:processor}|%{GREEDYDATA:log_message}",
  value: to_string($message.message),
  only_named_captures: true
  );
  set_fields(fields);
end

But when I do that, the fields do not extract correctly.

So I scaled it back and tried doing an extraction for just the first two fields,

then
  let fields = grok
  (pattern: "%{DATESTAMP:log_timestamp}|%{WORD:log_level}",
  value: to_string($message.message),
  only_named_captures: true
  );
  set_fields(fields);
  set_field("grok_pattern","Pattern 1");
end

and then I get the year, 2022 in the log_level field and no timestamp

The solution to me would be to escape the pipes in the pattern match with a backslash. Doing that works in the Grok pattern tester I am using but when I try to escape the pipes with a backslash while building the pipeline rule, I get errors and can’t save the rule.

What am I missing? I feel like it’s something simple but can’t seem to find the solution.

Thank you!

Figured it out. Needed to use double slashes. Was still having some problems matching the rest of the fields in the log but ended up creating a custom grok pattern NOTAPIPE like this [^|]+ and using that for the middle fields and then everything started showing up correctly.

Hope this helps someone else!

If you put the entire GROK as a custom GROK and reference, you don’t need to double up on the escapes. and it keeps the pipeline rule snippet easier to read.

Hi @tmacgbay,

Not sure I understand. But making the pipeline rule easier to read would be nice.

Are you saying to include the pipes in the custom Grok pattern like this:

|[^\|]+|

If I do that, won’t that then include the pipes in my capture? Or, can I actually use parentheses in the Grok pattern to define the actual capture itself like in regex. Something like this,

|([^\|]+)|

Just started learning about Grok when we installed Graylog so not sure what all is possible.

Thanks for your reply!

Under the System Menu->Grok Patterns you can created your own GROK patterns so that you can refer to them in your pipeline rule as one GROK name rather than the long list of individual GROK patterns that parse the message. This makes the rule concise and you don’t need to double up on escapes. For instance you might have the following in your System->Grok Patterns:

MYEXAMPLE = %{DATESTAMP:log_timestamp}|%{WORD:log_level}|%{WORD:processor}|%{GREEDYDATA:log_message}

in your rule you only need to use:

grok
  (pattern: "%{MYEXAMPLE}",
  value: to_string($message.message),
  only_named_captures: true
  );

Also of note - if you want to test out GROK parsing, here is the site I use (not perfect but it works well)

1 Like

omg. You should see the grok patterns I wrote for parsing radius logs. This would be sweet. I will have to give it a try.

I think I still have to escape the pipes in the grok pattern. At least what’s how it has worked in the grok pattern testers I have used but one escape character is better than two.

Thanks for the link! I do believe I have used that one before.

Again, thanks for the tips! I do appreciate it.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.