How to escape meta-charactor (|) in regex pattern


(Charles Deng) #1

i have defined a very simple test pipeline as:

stage 0 match either
	rule "input message does come with 7 fields";
    rule "input message does not come with 7 fields";

rule "input message does come with 7 fields"
when
	// There has 7 fields
    regex("^([^\\|]*\\|){6}[^\\|]*$",to_string($message.full_message)).matches
then
	// target fields initialized
	set_field("x_error_found",false);
	set_field("x_errors","");

	// Get input field values
	set_field("x_long_input",split("\\|",to_string($message.full_message),0));
	set_field("x_keyword_input",split("\\|",to_string($message.full_message),1));
	set_field("x_text",split("\\|",to_string($message.full_message),2));
	set_field("x_multi_fields",split("\\|",to_string($message.full_message),3));
	set_field("x_date_input",split("\\|",to_string($message.full_message),4));
	set_field("x_boolean_input",split("\\|",to_string($message.full_message),5));
	set_field("x_ip_input",split("\\|",to_string($message.full_message),6));
end

rule "input message does not come with 7 fields"
when
	not has_field("x_error_found")
then
	set_field("x_error_found",true);
	set_field("x_errors","input message does not come with 7 fields;");
end

but eventually the stored message like:

It seems the regex does not match with the message. is there any error in my pattern with

 meta charactor (|)
 escaped (\\|) 

(Jochen) #2

The following regular expression works for me:
^([^|]*\\|){6}([^|]*)$

Also take note, that group 0 includes the complete match, i. e. the content of the “full_message” field.

You can also use websites such as https://www.freeformatter.com/java-regex-tester.html to test your regular expressions.


(Charles Deng) #3

Yes. I put the raw message into full_message field along with other static fields added by collector agent.
i modify the pattern accordingly, but the the result message is same:

image

the input raw message being:

13941357829|mylogs.com|just a test|graylog.mylogs.com|1522130986|true|2001:0DB8:AC10:DE01::

and why somewhere using double escape chars and somewhere use zero escape char for meta-char?


(Charles Deng) #4
^([^\|]*\|){6}([^\|]*)$

will match the input raw message in the checking page you posted :

13941357829|mylogs.com|just a test|graylog.mylogs.com|1522130986|true|2001:0DB8:AC10:DE01::

but will report error by graylog:

image


(Jochen) #5

Since the strings in the pipeline rules are pretty much just Java strings, you need to escape the backslash (\) character (i. e. \\).


(Charles Deng) #6

so why in the pattern offered by you, the first and last meta char (|) not being escaped ?


(Jochen) #7

Characters inside a character set ([...]) don’t need to be escaped.


(Charles Deng) #8

thanks.

eventually, i found when the “when condition” is:

regex("^([^|]*\\|){6}([^|]*)$",to_string($message.full_message)).matches

match will failed. but when the "when condition is:

regex("^([^|]*\\|){6}([^|]*)$",to_string($message.full_message)).matches == true

the match will successful. although i don’t know why the second rule matches and the split not work as my expecation:


image

It seems regex(…).matches cannot be treated as a boolean value directly.


(system) #9

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.