Need some help with regex in pipeline rule

You’re pretty close. But there are few things that need to be corrected for the rule to work:

1. escaping characters

Graylog Pipeline rules use java style regex. Check out the Backslashes, escapes, and quoting section. The short of it is some characters must be escaped using a \. Incidentally, \ is one of the characters that needs to be escaped so anytime you have \ in normal regex, you would need to escape it and it would be come \\. For example \s becomes \\s.

2. regex pipeline function syntax

The syntax for the regex function is

regex(pattern: string, value: string, [group_names: array[string])

Your usage only specifies a pattern, but no value to match against.

the correct usage would be something like this (i’m using line breaks as well as explicitly naming each parameter for readability and clarity)

    let incominginterface = regex(
        pattern: "IN=*([^\\s]*)",
        value: to_string($message.message)
        );

The use of to_string() is required so that the regex function can operate on the field as a string. I’m assuming the field name is message. In the above example $message is an object, and each message field is accessed using a .. For example the source field is $message.source. The message field is $message.message etc etc.

3. Retrieving regex result matches and capture groups

To access the matches as a result of regex, we must specify which index (capture group) we want to use from the result. For example, to return the first capture group, we would use incominginterface["0"].

Capture group names can optionally be specified in the regex function:

    let incominginterface = regex(
        pattern: "IN=*([^\\s]*)",
        value: to_string($message.message),
        group_names: ["network_interface"]
        );

The result would then become incominginterface["network_interface"].

(OPTIONAL) specifying an explicit field type

One last thing that i recommend is to use a to_ function to ensure the field is saved using the correct type. This is less of an issue with strings (vs integers) but its still a good practice.

Set field would look something like this:

set_field("incominginterface", to_string(incominginterface["network_interface"]));

Summary and full rule

Putting all of this together gives us:

rule "ulogd_extract-basics"
when
    has_field("application_name")
    && to_string($message.application_name) == "ulogd"
then
    let incominginterface = regex(
        pattern: "IN=*([^\\s]*)",
        value: to_string($message.message),
        group_names: ["network_interface"]
        );
        
    set_field("incominginterface", to_string(incominginterface["network_interface"]));
end

We’re escaping the right characters, we’re specifying both a value and a pattern for regex, we’re using a capture group name, and we’re setting the field to string.

Hope that helps!

4 Likes