Grok Parse in Pipeline (bug or invalid escapes?)

Hey all,

Just found this. Trying to parse a few URIPATHs (field is URI_path) that look like this:

section=bottom&templateUrl=https://www.domain1.ca/parser.asp?app=tickets
section=doctype&templateUrl=https://www.domain1.ca/parser.asp?app=tickets
section=doctype&templateUrl=https://www.domain2.ca/en/Parser.asp
section=bottom&templateUrl=https://www.domain2.ca/en/Parser.asp

All I care about is domain1/domain2 and the app being used:

section=%{WORD:parse_section}&templateUrl=https://www.%{WORD:parse_organization}.ca/(en/)?(p|P)arser.asp(\?app=%{WORD:parse_application})?

works in the grok debugger… but fails in the pipeline rule. I’m sure there is a character throwing it off - but I seem to keep finding new and creative ways to break out of my grok/regex strings. Can anyone spot where I’m messing it up?

Full rule

rule "Simplify IIS Parse URI" 
when (
    has_field("log_type") AND 
    contains(to_string($message.log_type),"IIS",true)
    ) AND (
    has_field("URI_path") AND 
    contains(to_string($message.URI_path),"/parser/parser.ashx",true)
    )
then
    
let unparsed = to_string($message.URI_path);
let parsed = grok(pattern:"section=%{WORD:parse_section}&templateUrl=https://www.%{WORD:parse_organization}.ca/(en/)?(p|P)arser.asp(\?app=%{WORD:parse_application})?",value: unparsed,only_named_captures: true);
set_fields(parsed);
end

UPDATE: This exact pattern works in the GROK extractor for any test message - but still won’t save in a Pipeline.

UPDATE 2: The escaped ? at the end of the the URI_path was the culprit. I had to double-escape it in the pipeline rule - lets see if it still processes. I’ll post back.

UPDATE 3: Editor allows me to save the rule, but it won’t extract.

UPDATE 4: This is the GL processing error:

For rule 'Simplify IIS Parse URI': In call to function 'grok' at 13:13 an exception was thrown: Unknown inline modifier near index 77 section=(\b\w+\b)&templateUrl=https://www.(\b\w+\b).ca/(en/)?(p|P)arser.asp(?app=(\b\w+\b))?

My regex sucks, so I can’t figure out where its wrong.

I personal would place the working GROK Pattern into a new GROK pattern in your System (System > GROK-Patterns) and use that single word in the processing pipeline. That prevents you from the double escape you need in the editor.

This exact pattern works in the GROK extractor for any test message - but still won’t save in a Pipeline.

That’s a common occurence. You have to use double escapes on all special characters, so instead of

\(en\)

You’ll have to put

\\(en\\)

in the graylog grok-Filter.