Issue with parsing regex expression in pipelines

Hello,

I am using Graylog version 5.2.2, and I am encountering an issue with parsing JSON from a message string within pipeline rules.

The log messages have the following format: cnmaestro cnmaestro[4076]: {Json}
I’ve been trying to extract the JSON part using the following regex expression: ^cnmaestro cnmaestro\[\d+\]:\s+(\{.*})

This regex works as expected in extractors and Java regex testers, successfully isolating the JSON string. Since I understand that extractors may be deprecated in the future, I’m attempting to transition to using pipelines.

However, after adjusting the regex to account for double backslashes needed in the pipeline rule syntax: ^cnmaestro cnmaestro\\[\\d+\\]:\\s+(\\{.*\\})

The simulation of the rule does not yield the expected result. Instead, it shows a blank message, containing only the timestamp and message fields.
Here’s the pipeline rule I’ve written:

rule “extract json”
when
true
then
let regex_pattern = “^cnmaestro cnmaestro\\[\\d+\\]:\\s+(\\{.*\\})”;
let result = regex(pattern: regex_pattern, value: to_string($message.message));
set_field(“JSONString”, to_string(result[“1”]));
end
When I run the simulation, it doesn’t parse the JSON into its own field as expected. I’m not receiving any errors, but the JSONString field is not populated.
Could someone please assist me with adjusting this rule to work correctly, or point out what I might be doing wrong?

Thank you in advance!

It may be correct, the rule simulator currently has some issues with text that contains characters that would be in JSON etc. So you could give it a shot on live data, or if you already have some messages you could use a pipeline decorator to test.

Thank you for the suggestion.

I’ve attempted to verify if the pipeline is processing messages by adding a diagnostic set_field operation within the same rule: set_field(“Pipeline”, “true”);

After checking the incoming live data, I can confirm that the messages are being tagged with the new field “Pipeline”: “true”, indicating that the pipeline is indeed being applied. However, the field “JSONString”, which should contain the extracted JSON, is not present in the messages.

Is there a possibility that the regex function or the way I’m attempting to set “JSONString” could be failing silently? Or is there any additional logging or debugging technique within Graylog that could help me pinpoint where the rule might be going wrong?

Here is the complete rule for reference :
(there should only be on asterisk in the regex pattern, I’ve had trouble with displaying it in the community post text body)

rule “extract json”
when
true
then
set_field(“Pipeline”, “true”);
let regex_pattern = “^cnmaestro cnmaestro\\[\\d+\\]:\\s+(\\{.**\\})”;
let result = regex( pattern:“^cnmaestro cnmaestro\\[\\d+\\]:\\s+(\\{.*\\})”,
value: to_string($message.message));

set_field("JSONString", to_string(result["1"]));

end

Maybe I’m missing it, but I only see one capture group, which would need result[0] not 1. You could always set a field with to_string(result) to see what all the capture groups return are getting returned if you have more than one.

I’ve adjusted the indexing for the capture group to show all result groups, which has resulted in the JSONString field being populated with {}. This suggests that the regex is indeed matching the curly braces but not capturing the content within them.

It’s interesting to note that the regex pattern I am using works as expected in the extractors, where it captures the entire JSON string with two capture groups. Here’s the pattern that works in the extractors: ^cnmaestro cnmaestro\[\d+\]:\s+(\{.*\})
When applied in a pipeline, however, the field is filled with empty brackets, suggesting that the capture group is not working as intended or that the JSON content is not being matched:

rule “extract json”
when
has_field(“message”)
then
let regex_pattern = “^cnmaestro cnmaestro\\[\\d+\\]:\\s+(\\{.*\\})”;
let result = regex(pattern: regex_pattern, value: to_string($message.message));
set_field(“JSONString”, to_string(result));
end
Here is what the message field that I am trying to parse looks like from the stream:
cnmaestro cnmaestro[4076]: {“cid”:“cnmaestro”,“client”:“4A:85:C6:09:54:65”,“dupCnt”:1,“eId”:“WIFI_CLIENT_CONNECTED”,“eTime”:1704204565255,“hmsg”:“Z1XaTeP”,“level”:“info”,“mac”:“58:C1:7A:93:E3:36”,“mcid”:“”,“msg”:“4A-85-C6-09-54-65 → VOPPUBLIC”,“msgCode”:2400,“sev”:0}

Could there be a difference in how extractors and pipeline rules handle regex capture groups? Is there a modification to the regex pattern that I should consider to ensure it captures the JSON content within the curly braces in the pipeline rule? Additionally, how can I debug the content of the capture groups to understand what’s being returned by the regex function?

I would greatly appreciate any further guidance on this issue.

Your regex pattern is essentially correct. The problem are the enclosing quote characters. Looks like you pasted this from an editor which replaced ASCII quotes with the UNICODE left and right double quote characters:

"   U+0022 QUOTATION MARK
“   U+201C LEFT DOUBLE QUOTATION MARK
”   U+201D RIGHT DOUBLE QUOTATION MARK

Those are not recognized as quotes in the pipeline rule.

Thank you for highlighting the issue with the quote characters. I have double-checked the regex pattern in the rule to ensure it’s using standard ASCII double quotes and not UNICODE characters. Attached is a screenshot displaying the code and results. After running the simulation with the corrected quotes, the JSONString field is still only populated with {}, which suggests that the regex is not capturing the JSON content as intended.

Could this be an issue with how Graylog’s simulation handles the regex capture groups, or is there possibly something else I’m missing in my pattern? The pattern seems to work correctly in other environments (such as extractors and online regex testers), but not in the pipeline simulation.

Is there a way to debug the actual content of the regex capture groups in Graylog’s pipeline to see what’s being matched and captured?

You need to escape the quotes in the JSON message string that you provide the rule simulator.

For debugging you can look at the GL log messages, e.g. under System/Nodes → More actions → Recent system log messages

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.