Pipeline rule to split log messages into key-value fields not working

1. Describe your incident:

Unable to split audit log messages into separate fields (by key-values) and prefixing these fields with “auditd_”.

2. Describe your environment:

  • OS Information:
    Debian 10 LTS (4.19.0-21-amd64 #1 SMP Debian 4.19.249-2 (2022-06-30) x86_64 GNU/Linux)
  • Package Version:
    Graylog 4.2.7+879e651

3. What steps have you already taken to try and solve the problem?

See my previously created topic from this post onwards: Syslog message not being parsed when using Graylog Sidecar with filebeat input - #24 by foss4ever

If you’re wondering why I’m creating two topics around the same issue, the reason would be that the first one did not result in any solution and so I am giving one more shot before taking more drastic steps such moving to a different log mgmt solution than Graylog.

4. How can the community help?

Tell me what I’m doing wrong :slight_smile:

I have tried getting this pipeline rule working by modifying the rule code in a bunch of different ways, but had no luck whatsoever in getting any log throughput on the rule.
Have honestly run out of ideas at this point, so really hoping that some Graylog guru out there could give me a push in the right direction…

FYI, my current, non-working rule code looks like this:

rule "auditd_keys_to_fields"
when
    starts_with ((to_string($message.message)) , "type", true )
then
      // extract all key-value from "message" and prefix it with auditd_
    set_fields(
        fields:
            key_value(
                value: to_string($message.message),
                trim_value_chars: "\""
            ),
            prefix: "auditd_"
);
end

Here’s an example of an audit log message that I would like to split into key-value fields:

type=SYSCALL msg=audit(1661161056.175:207102145): arch=c000003e syscall=42 success=no exit=-115 a0=23 a1=563c354e2198 a2=10 a3=0 items=0 ppid=616 pid=617 auid=4294967295 uid=108 gid=113 euid=108 suid=108 fsuid=108 egid=113 sgid=113 fsgid=113 tty=(none) ses=4294967295 comm=“haproxy” exe=“/usr/sbin/haproxy” subj==unconfined key=“mdatp”

Check to see if if your Message Filter Chain is before your Pipeline Processor under Message Processor Configuration, IF NOT - this often causes no results in the pipeline. Also, you can watch in the logs to see what is going on using the debug() function - I added it below with a note that shows where to look for the debug message. here are some possible results:

  • You see the full message debug showing up in the logs but the field are not split out with key_value() — this means there is some error in the key_value() I have seen key_value() fail silently without logs or anything just because there was a key it didn’t like (had spaces)… the portion subj==unconfined key="mdatp" looks odd and may cause a failure… something to consider.

  • If you are NOT seeing the debug messages in the log, the when statement might be having issues… change it to just true to catch all things passing by the rule

  • If you have true in your when and it is still not seeing the full messages, then it is possible the pipeline/rule is not connected to the stream that has the message coming in.

rule "auditd_keys_to_fields"
when
    starts_with ((to_string($message.message)) , "type", true )
then
     // use $ tail -f /var/log/graylog-server/server.log to watch for the results of the below debug message
    debug(concat("++++++++++ Full Message: ", to_string($message.message))); 

    // extract all key-value from "message" and prefix it with auditd_
    set_fields(
        fields: key_value(
                    value: to_string($message.message),
                    trim_value_chars: "\""
                ),
        prefix: "auditd_"
        );
end
2 Likes

@tmacgbay :+1:

I did a Copy & Paste in my lab with @foss4ever logs. Learned something new, Thanks

Check to see if if your Message Filter Chain is before your Pipeline Processor under Message Processor Configuration

Looks ok to me:

I have tried changing the “when” condition to just true on my pipeline rule as you suggested as well as replacing the rule code with what you posted, containing debug instruction but I still get a message throughput of zero on the rule as well as no output in /var/log/graylog-server/server.log

Double-checking the pipeline, we see that it is connected to the audit log stream:

…which has plenty of log data coming in, as you can see here:

So this is just getting really, really weird :upside_down_face:

For the sake of testing more thoroughly, I have also tried setting up a new pipeline and rule for a different stream “Debian server logs” with the same result of no log throughput whatsoever. So it seems this issue is not related to the audit log stream as such…

Set your Message Filter Chain to come before your pipeline processor … Message Filter Chain should have a lower number than the Pipeline Processor.

1 Like

Sorry, I misread your initial explanation.

But now, magically, everything seems to finally work :sweat_smile:

Big thanks go out to yourself as well as @gsmith

At this point I can’t help but wonder why the default Message Processors Configuration would be set in a way that seems to prevent pipelines from working in general :thinking:

Maybe that is one for the Graylog guys to answer.
Let’s see if they stop by this thread before it is archived

4 Likes

I am not sure why it is set up that way…

@dscryber do you have a contact with someone in engineering who could give the explanation of why the Message Filter Chain defaults to being after the Processor Pipeline (when all the cool kids like it to be first) and maybe why you would want it that way?

3 Likes

Thanks, @tmacgbay . Yes, I’ll bring this to engineering’s attention.

1 Like

Sounds great.
If that configuration could be changed it might save another Graylog newbie like myself many hours of frustration :slight_smile:

1 Like

The default order is an unfortunate oversight. It is ordered alphabetically (with the full class name)
For a long time we didn’t dare to change the default, because it will break the processing
of existing setups.
The next release will have the order fixed. (But only for new installations)
See Define a new improved default message processing order by mpfz0r · Pull Request #13081 · Graylog2/graylog2-server · GitHub
for details.

Also of Note: The stream rules are become a separate message processor:

4 Likes

Thank you so, so much!. I am really looking forward to that change.


err, sorry to bother you, but is there a chance that the implicit processing within the syslog input would get separated in the same way? I mean - the syslog input sometimes detects and parses key=value pairs on its own, creating unwanted fields that are hard or impossible to get rid of.
I work this around by using Raw input (does not work for length-prefix framed messages), but being able to (re)move that step would allow a much more generic solution - e.g. sanitizing the input with extractors.

Thoughts on key_value() issues on the other post.

  • You could set a replace of all spaces to underscores for anything between quotes - pretty sure you could build a regex_replace() but it might be challenging in the pipeline for that. key_value() would work smoother then.

  • If you know the fields you want, build a GROK statement to pull just those fields

With key_value() running from extractors or pipelines yes, but the syslog input itself has a feature that parses key=value messages from Fortigate firewalls, i.e. by the time extractors or pipelines get a chance to run the damage is already done. I found no way to reliably clear the auto-created fields.

I had in my head but did not type… :smiley: You could run it through a raw input, after that you would GROK the fields you want out of it… assumes that you know all the fields you want…

As I write in RAW Input with “Length-prefixed framing”, I can’t. Raw input works only for messages with newline or null-character framing. My messages have length-prefixed framing and if I send them to raw input they all join together into one big mess.

yuk. Perhaps you can use your pipeline rule "TEST FGT-Direct re-parser" that creates a new message the GROK it rather than KV it…

That’s what I do, but it is annoying, wastes energy and still allows hacks to go through.
Imagine somebody tries to log in as Hello" message=none". KV parser built into syslog input will happily destroy the original message field. Extractors or pipelines can’t fix that.

PS: You forget about Running key=value tokenizer extractor from Pipelines (now possible in v4.3). Graylog v4.3 pipelines have key_value that could parses the whole thing correctly. But that works only if message survives the built-in non-optional parsing in the syslog input.

I didn’t’ forget! :smiley: You could use that… OR to better account for whacky user input strings…

If you eliminate the key_value() completely, and take the messy non-optional syslog $message.message and run a GROK against it where your hacky friends put in Hello" message=none" but have the GROK stop its field find on the next known field (rather than just quotes) you can reasonably account for what someone might put in there. Then it is the relatively inefficient creation of a new message combined with a drop_message() on the current… but Graylog seems to be pretty quick about all that…

1 Like