Unwanted Grok Fields

Hi there. I’m fairly new to the Graylog environment and log manipulation in general. I have been working to try and create extractors using Grok patterns for my router’s syslog. When I use a built-in Grok Pattern such as %{SYSLOGTIMESTAMP} that contains other Grok patterns inside of it (i.e. %{MONTH} +%{MONTHDAY} %{TIME} ), it seems to be parsing additional patterns that are not needed. What I mean by this, is that if I was to use %{SYSLOGTIMESTAMP:Timestamp} in my extractor, it produces 6 other patterns (MONTH, MONTHDAY, etc.) that display the same information as Timestamp. I have attempted to attach a screenshot to better portray what I am explaining.

I have tried making my own custom Grok pattern to solve this issue, but I still end up producing the same result. I am running Graylog version 5.1 . Does someone know how to make it to where I only produce the %{SYSLOGTIMESTAMP:Timestamp} pattern in my extractor and not the other duplicate outputs?

  1. try to avoid extractors! Better go for pipelines.
  2. here an example how to run a grook against a message:
rule "The Name of your rule"
when
  to_string($message.someField) == "someValue" 
then
  set_fields(
    grok(
      pattern:"^%{NAME_OF_YOUR_GROK_PATTERN}",
      value:to_string($message.message),
      only_named_captures:true
    )
  );
end

The parameter only_named_captures:true will make sure not to grab every field you did not assign a name to.

1 Like

Thank you for your reply.
You’re right. I probably should have dived into using pipelines a little more. The guide I was following for setup configurations did not include the topic, so it was outside of my field of view at first. I was able to use the %{SYSLOGTIMESTAMP} pattern that I mentioned in my original post along with the only_named_captures parameter to produce one field instead of 7. I’ll have to dive deeper into pipelines because it seems there is a lot of granular control to be gained from all of these functions.

yes, do it! Imagine logs like from cisco asa - plain syslog “prosa” with unique IDs in the beginning. With extractors every extractor would run against every log - a lot of wasted CPU, as Grok evaluations are expensive and the most of them would not fire. In pipelines you can first parse the ID out of the logs, and then build a rule that runs only against a certain ID. Only one grok will fire per log - and it will be the right one.

Pipelines are awesome and you really should consider using them. Also for routing into streams they are great.

It isn’t true, with extractors you can also set conditions (message contains string or match a regex) in order to apply the extractor only when needed.

Regarding your issue, do you check “Named captures only” ? Because it’s exactly the purpose of this setting, to only extracts the named grok patterns (in your case Timestamp).

Ah, so that’s what I was missing my first time creating extractors! I knew there had to be something to fix the issue of it creating multiple captures of the same data. Thank you for the response @frantz . I have already dived deeper into pipelines since it seems there is some fun granular control there. But it is good to know for the future that this option exists.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.