Hi
This query is able to receive logs from search page NOT _exists_:extra_data && request_url:/\/([A-z0-9]+\/)*[A-z0-9]+([-.~_@][A-z0-9]+)*\.(css|ico|jpg|js|json|map|png|svg|xml|woff2)(\?[A-z0-9]{2,20}|\?v=[0-9])?/
but, same Regex doesn’t work in pipeline rule, What I miss?
rule "[Type] Normal request"
when
NOT has_field("extra_data") && (
regex("\\/(\\?v[0-9]|index\\.html)?", to_string($message.request_url)).matches == true ||
regex("\\/([A-z0-9]+\\/)*[A-z0-9]+([-.~_@][A-z0-9]+)*\\.(css|ico|jpg|js|json|map|png|svg|xml|woff2)(\\?[A-z0-9]{2,20}|\\?v=[0-9])?", to_string($message.request_url)).matches == true
)
then
set_field("normalRequest", "true");
end
Information:
Graylog: 5.0.3+a82acb2, codename Noir
JVM: PID 506, Eclipse Adoptium 17.0.6 on Linux 5.15.0-1029-gcp
What I had tried:
In same stream, same time range
Below query received log that value of request_url filed is: /image/33f46ea824b22bd4a9cc261cbd5112db.png !_exists_:normalRequest && request_url:/.*png/
But, below query received log that value of request_url filed is: /image/33aa794bbef0221ba00b6e99e11fe7ca.png _exists_:normalRequest && request_url:/.*png/
It seem the pipeline rule is unstable, what should I do?
Hmmm - after playing with it a bit, it all looks right… though I haven’t run it through extensive testing like @gsmith often does (in his Batman persona) haha!
It could be that you have the messages ending in more than one index something like Remove from default Index is not checked.
You can also use the debug() function to see what is going on in there
...
then
set_field("normalRequest", "true");
//
// use $ tail -f /var/log/graylog-server/server.log to watch for the results of the below debug message
//
debug(concat("============ request_url: ",to_string($message.request_url)));
debug(concat("============ normal_request: ",to_string($message.normalRequest)));
...
Its not clear to me what your trying to achieve. Correct me if im wrong, I think you try to get the URL from message and create a new field? And the regex from Global search need to be modified when in a Pipeline.
By chance di you execute a debug in the pipeline like @tmacgbay suggested?
“I think you try to get the URL from message and create a new field?”
Yes, I want to add a field named “normal_request” in message if the URL in message is what I know(e.g. /favicon.ico)
“the regex from Global search need to be modified when in a Pipeline”
Yes, regex in Pipeline need to be modified, but for escape any backslashes only
2023-02-21T02:09:26.931Z INFO [Function] PIPELINE DEBUG: ============ request_url: /favicon.ico
2023-02-21T02:09:26.931Z INFO [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:09:34.674Z INFO [Function] PIPELINE DEBUG: ============ request_url: /image/36a198905759428997bad81b0ef0b039.png
2023-02-21T02:09:34.674Z INFO [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:09:34.678Z INFO [Function] PIPELINE DEBUG: ============ request_url: /image/9827460e08474c17a792bef13db51dfc.png
2023-02-21T02:09:34.679Z INFO [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:09:34.681Z INFO [Function] PIPELINE DEBUG: ============ request_url: /image/dc34cbedacd147b7bc661f0199bf30af.jpg
2023-02-21T02:09:34.681Z INFO [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:09:34.683Z INFO [Function] PIPELINE DEBUG: ============ request_url: /image/8a147ad2d3bf40d3bb62e3629b1eb3f9.png
2023-02-21T02:09:34.683Z INFO [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:09:34.683Z INFO [Function] PIPELINE DEBUG: ============ request_url: /image/47617dd245e6416b95b2a72632fe7071.jpg
2023-02-21T02:09:34.683Z INFO [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:09:34.685Z INFO [Function] PIPELINE DEBUG: ============ request_url: /image/ee466b52e40a481e9a219ef39c9c84fe.png
2023-02-21T02:09:34.685Z INFO [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:09:34.973Z INFO [Function] PIPELINE DEBUG: ============ request_url: /image/47617dd245e6416b95b2a72632fe7071.jpg
2023-02-21T02:09:34.973Z INFO [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:13:55.041Z INFO [Function] PIPELINE DEBUG: ============ request_url: /favicon.ico
2023-02-21T02:13:55.042Z INFO [Function] PIPELINE DEBUG: ============ normal_request: true
I can’t figure out problem form debug log, but below query in Global search will receive log that “request_url” is “/favicon.ico” even the log should have normalRequest field NOT _exists_:extra_data && request_url:/\/([A-z0-9]+\/)*[A-z0-9]+([-.~_@][A-z0-9]+)*\.(css|ico|jpg|js|json|map|png|svg|xml|woff2)(\?[A-z0-9]{2,20}|\?v=[0-9])?/ AND NOT _exists_:normalRequest
1.Extract URL from message/full_message
2.Check if its a known URL
3.If its a known URL then create field normal_request = true
3.Unknown then drop it?
I dont know as much as @tmacgbay with the debug logs , but from my little bit of testing, it looks like the pipeline is doing the right stuff.
If those steps are correct. What I would do is extract the URL and create the new field. Then check it against a lookup table. If NOT true then drop message or re-route it.
There is something going on that we aren’t asking about.
When you search for a message that is correct (has normal _request: true and is normal request) and click on it to open details - does the “Routed into streams” section match with the one where the message that is incorrect (Does NOT have normal_request field and is normal request)?
So when you search for request_url:/.*png/ you get results that have normalRequest:true and you also get results that don’t have the the field normalRequest.
Are there duplicates?
Can you show a screen shot of getting results that show both results? (Obfuscated as needed of course!)
Have you verified (via regex101.com or similar) that the request_url fields that were missed by the rule still fit within your regex?
Somehow messages are either leaking around the rule or are not even making it to the pipeline. How about this - try creating a rule in a following pipeline stage like the below. This rule should only pop into the debug logs if request_url ending in png was missed by your rule
rule "PipelineLeak"
when
NOT has_field("extra_data") &&
NOT has_field("normalRequest") &&
ends_with("png",to_string($message.request_url))
then
debug(concat("============Leaked request_url: ",to_string($message.request_url)));
end
Yes,
The content of request_url field in pictures that I posted in Feb 23 is /assets/img/app-download-bg@3x.cefe823.png, and it is match regex: /([A-z0-9]+/)[A-z0-9]+([-.~_@][A-z0-9]+).(css|ico|jpg|js|json|map|png|svg|xml|woff2)(?[A-z0-9]{2,20}|?v=[0-9])?
As picture, the same content, some have normalRequest field, but some don’t