Pipeline rule to split log messages into key-value fields not working

Check to see if if your Message Filter Chain is before your Pipeline Processor under Message Processor Configuration, IF NOT - this often causes no results in the pipeline. Also, you can watch in the logs to see what is going on using the debug() function - I added it below with a note that shows where to look for the debug message. here are some possible results:

  • You see the full message debug showing up in the logs but the field are not split out with key_value() — this means there is some error in the key_value() I have seen key_value() fail silently without logs or anything just because there was a key it didn’t like (had spaces)… the portion subj==unconfined key="mdatp" looks odd and may cause a failure… something to consider.

  • If you are NOT seeing the debug messages in the log, the when statement might be having issues… change it to just true to catch all things passing by the rule

  • If you have true in your when and it is still not seeing the full messages, then it is possible the pipeline/rule is not connected to the stream that has the message coming in.

rule "auditd_keys_to_fields"
when
    starts_with ((to_string($message.message)) , "type", true )
then
     // use $ tail -f /var/log/graylog-server/server.log to watch for the results of the below debug message
    debug(concat("++++++++++ Full Message: ", to_string($message.message))); 

    // extract all key-value from "message" and prefix it with auditd_
    set_fields(
        fields: key_value(
                    value: to_string($message.message),
                    trim_value_chars: "\""
                ),
        prefix: "auditd_"
        );
end
2 Likes

@tmacgbay :+1:

I did a Copy & Paste in my lab with @foss4ever logs. Learned something new, Thanks

Check to see if if your Message Filter Chain is before your Pipeline Processor under Message Processor Configuration

Looks ok to me:

I have tried changing the “when” condition to just true on my pipeline rule as you suggested as well as replacing the rule code with what you posted, containing debug instruction but I still get a message throughput of zero on the rule as well as no output in /var/log/graylog-server/server.log

Double-checking the pipeline, we see that it is connected to the audit log stream:

…which has plenty of log data coming in, as you can see here:

So this is just getting really, really weird :upside_down_face:

For the sake of testing more thoroughly, I have also tried setting up a new pipeline and rule for a different stream “Debian server logs” with the same result of no log throughput whatsoever. So it seems this issue is not related to the audit log stream as such…

Set your Message Filter Chain to come before your pipeline processor … Message Filter Chain should have a lower number than the Pipeline Processor.

1 Like

Sorry, I misread your initial explanation.

But now, magically, everything seems to finally work :sweat_smile:

Big thanks go out to yourself as well as @gsmith

At this point I can’t help but wonder why the default Message Processors Configuration would be set in a way that seems to prevent pipelines from working in general :thinking:

Maybe that is one for the Graylog guys to answer.
Let’s see if they stop by this thread before it is archived

4 Likes

I am not sure why it is set up that way…

@dscryber do you have a contact with someone in engineering who could give the explanation of why the Message Filter Chain defaults to being after the Processor Pipeline (when all the cool kids like it to be first) and maybe why you would want it that way?

3 Likes

Thanks, @tmacgbay . Yes, I’ll bring this to engineering’s attention.

1 Like

Sounds great.
If that configuration could be changed it might save another Graylog newbie like myself many hours of frustration :slight_smile:

1 Like

The default order is an unfortunate oversight. It is ordered alphabetically (with the full class name)
For a long time we didn’t dare to change the default, because it will break the processing
of existing setups.
The next release will have the order fixed. (But only for new installations)
See Define a new improved default message processing order by mpfz0r · Pull Request #13081 · Graylog2/graylog2-server · GitHub
for details.

Also of Note: The stream rules are become a separate message processor:

4 Likes

Thank you so, so much!. I am really looking forward to that change.


err, sorry to bother you, but is there a chance that the implicit processing within the syslog input would get separated in the same way? I mean - the syslog input sometimes detects and parses key=value pairs on its own, creating unwanted fields that are hard or impossible to get rid of.
I work this around by using Raw input (does not work for length-prefix framed messages), but being able to (re)move that step would allow a much more generic solution - e.g. sanitizing the input with extractors.

Thoughts on key_value() issues on the other post.

  • You could set a replace of all spaces to underscores for anything between quotes - pretty sure you could build a regex_replace() but it might be challenging in the pipeline for that. key_value() would work smoother then.

  • If you know the fields you want, build a GROK statement to pull just those fields

With key_value() running from extractors or pipelines yes, but the syslog input itself has a feature that parses key=value messages from Fortigate firewalls, i.e. by the time extractors or pipelines get a chance to run the damage is already done. I found no way to reliably clear the auto-created fields.

I had in my head but did not type… :smiley: You could run it through a raw input, after that you would GROK the fields you want out of it… assumes that you know all the fields you want…

As I write in RAW Input with “Length-prefixed framing”, I can’t. Raw input works only for messages with newline or null-character framing. My messages have length-prefixed framing and if I send them to raw input they all join together into one big mess.

yuk. Perhaps you can use your pipeline rule "TEST FGT-Direct re-parser" that creates a new message the GROK it rather than KV it…

That’s what I do, but it is annoying, wastes energy and still allows hacks to go through.
Imagine somebody tries to log in as Hello" message=none". KV parser built into syslog input will happily destroy the original message field. Extractors or pipelines can’t fix that.

PS: You forget about Running key=value tokenizer extractor from Pipelines (now possible in v4.3). Graylog v4.3 pipelines have key_value that could parses the whole thing correctly. But that works only if message survives the built-in non-optional parsing in the syslog input.

I didn’t’ forget! :smiley: You could use that… OR to better account for whacky user input strings…

If you eliminate the key_value() completely, and take the messy non-optional syslog $message.message and run a GROK against it where your hacky friends put in Hello" message=none" but have the GROK stop its field find on the next known field (rather than just quotes) you can reasonably account for what someone might put in there. Then it is the relatively inefficient creation of a new message combined with a drop_message() on the current… but Graylog seems to be pretty quick about all that…

1 Like

Wrong.

The non-optional parsing inside of syslog input will overwrite $message.message when it sees that input. When Extractors and Pipelines get a chance to run $message.message will be equal to "none\". Try it.


Here is an input that triggers this fortigate key_value parsing inside of syslog input:

<185>date=2022-09-01 time=12:34:56 devname="Fortigate" timestamp=1234567890 user="Hello\" message=none " group="N/A"

I set this via netcat to a syslog TCP input and confirmed that this really overwrites $message.mesage
PS: This example uses newline framing for simplicity. Real messages are longer and length-prefixed.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.