Regex in search bar is working but doesn't work in Pipeline rule

Hi
This query is able to receive logs from search page
NOT _exists_:extra_data && request_url:/\/([A-z0-9]+\/)*[A-z0-9]+([-.~_@][A-z0-9]+)*\.(css|ico|jpg|js|json|map|png|svg|xml|woff2)(\?[A-z0-9]{2,20}|\?v=[0-9])?/

but, same Regex doesn’t work in pipeline rule, What I miss?

rule "[Type] Normal request"
when
  NOT has_field("extra_data") && (
    regex("\\/(\\?v[0-9]|index\\.html)?", to_string($message.request_url)).matches == true ||
    regex("\\/([A-z0-9]+\\/)*[A-z0-9]+([-.~_@][A-z0-9]+)*\\.(css|ico|jpg|js|json|map|png|svg|xml|woff2)(\\?[A-z0-9]{2,20}|\\?v=[0-9])?", to_string($message.request_url)).matches == true
  )
then
  set_field("normalRequest", "true");
end

Information:
Graylog: 5.0.3+a82acb2, codename Noir
JVM: PID 506, Eclipse Adoptium 17.0.6 on Linux 5.15.0-1029-gcp

What I had tried:
In same stream, same time range
Below query received log that value of request_url filed is: /image/33f46ea824b22bd4a9cc261cbd5112db.png
!_exists_:normalRequest && request_url:/.*png/

But, below query received log that value of request_url filed is: /image/33aa794bbef0221ba00b6e99e11fe7ca.png
_exists_:normalRequest && request_url:/.*png/

It seem the pipeline rule is unstable, what should I do?

Hmmm - after playing with it a bit, it all looks right… though I haven’t run it through extensive testing like @gsmith often does (in his Batman persona) haha!

It could be that you have the messages ending in more than one index something like Remove from default Index is not checked.

You can also use the debug() function to see what is going on in there


...
then
 set_field("normalRequest", "true");
 //
 // use $ tail -f /var/log/graylog-server/server.log to watch for the results of the below debug message
 //
 debug(concat("============ request_url: ",to_string($message.request_url)));
 debug(concat("============ normal_request: ",to_string($message.normalRequest)));
...
1 Like

Hi
Thanks for your help :grin:
Access logs are send to one input, and setting of the stream is:

  • gl_source_input must match input AccessLog (GELF TCP: 62736d6fc5625c081ff39b99)
  • The “Remove matches from ‘Default Stream’” is enable

So, I don’t think there will be duplicate logs appearing in different indexes

Hey @Kevin.Lin

Its not clear to me what your trying to achieve. Correct me if im wrong, I think you try to get the URL from message and create a new field? And the regex from Global search need to be modified when in a Pipeline.

By chance di you execute a debug in the pipeline like @tmacgbay suggested?

“I think you try to get the URL from message and create a new field?”
Yes, I want to add a field named “normal_request” in message if the URL in message is what I know(e.g. /favicon.ico)

“the regex from Global search need to be modified when in a Pipeline”
Yes, regex in Pipeline need to be modified, but for escape any backslashes only

@tmacgbay @gsmith
The debug logs is:

2023-02-21T02:09:26.931Z INFO  [Function] PIPELINE DEBUG: ============ request_url: /favicon.ico
2023-02-21T02:09:26.931Z INFO  [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:09:34.674Z INFO  [Function] PIPELINE DEBUG: ============ request_url: /image/36a198905759428997bad81b0ef0b039.png
2023-02-21T02:09:34.674Z INFO  [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:09:34.678Z INFO  [Function] PIPELINE DEBUG: ============ request_url: /image/9827460e08474c17a792bef13db51dfc.png
2023-02-21T02:09:34.679Z INFO  [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:09:34.681Z INFO  [Function] PIPELINE DEBUG: ============ request_url: /image/dc34cbedacd147b7bc661f0199bf30af.jpg
2023-02-21T02:09:34.681Z INFO  [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:09:34.683Z INFO  [Function] PIPELINE DEBUG: ============ request_url: /image/8a147ad2d3bf40d3bb62e3629b1eb3f9.png
2023-02-21T02:09:34.683Z INFO  [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:09:34.683Z INFO  [Function] PIPELINE DEBUG: ============ request_url: /image/47617dd245e6416b95b2a72632fe7071.jpg
2023-02-21T02:09:34.683Z INFO  [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:09:34.685Z INFO  [Function] PIPELINE DEBUG: ============ request_url: /image/ee466b52e40a481e9a219ef39c9c84fe.png
2023-02-21T02:09:34.685Z INFO  [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:09:34.973Z INFO  [Function] PIPELINE DEBUG: ============ request_url: /image/47617dd245e6416b95b2a72632fe7071.jpg
2023-02-21T02:09:34.973Z INFO  [Function] PIPELINE DEBUG: ============ normal_request: true
2023-02-21T02:13:55.041Z INFO  [Function] PIPELINE DEBUG: ============ request_url: /favicon.ico
2023-02-21T02:13:55.042Z INFO  [Function] PIPELINE DEBUG: ============ normal_request: true

I can’t figure out problem form debug log, but below query in Global search will receive log that “request_url” is “/favicon.ico” even the log should have normalRequest field
NOT _exists_:extra_data && request_url:/\/([A-z0-9]+\/)*[A-z0-9]+([-.~_@][A-z0-9]+)*\.(css|ico|jpg|js|json|map|png|svg|xml|woff2)(\?[A-z0-9]{2,20}|\?v=[0-9])?/ AND NOT _exists_:normalRequest

Hey,

See if i get this correct,

Steps:

1.Extract URL from message/full_message
2.Check if its a known URL
3.If its a known URL then create field normal_request = true
3.Unknown then drop it?

I dont know as much as @tmacgbay with the debug logs , but from my little bit of testing, it looks like the pipeline is doing the right stuff.

If those steps are correct. What I would do is extract the URL and create the new field. Then check it against a lookup table. If NOT true then drop message or re-route it.

Step 1~3 are correct, but no “3.Unknown then drop it”
(do nothing if Unknown URL)

so I can find unknown URL logs with search syntax:
NOT _exists_:normalRequest

1 Like

Stumper.

There is something going on that we aren’t asking about.

When you search for a message that is correct (has normal _request: true and is normal request) and click on it to open details - does the “Routed into streams” section match with the one where the message that is incorrect (Does NOT have normal_request field and is normal request)?

image

1 Like

Yes, both routed into same stream.

So when you search for request_url:/.*png/ you get results that have normalRequest:true and you also get results that don’t have the the field normalRequest.

Are there duplicates?

Can you show a screen shot of getting results that show both results? (Obfuscated as needed of course!)

1 Like

Good question.

Same time, left side “exclude” normalRequest, and right side “include” normalRequest
There didn’t duplicate.

What is your stream rule(s) for capturing the logs from the input? I am wondering if they are narrow and missing picking up some items…

I have a input only for web logs, and a stream only for the input.
GraylogNginxAccessLog.drawio

Have you verified (via regex101.com or similar) that the request_url fields that were missed by the rule still fit within your regex?

Somehow messages are either leaking around the rule or are not even making it to the pipeline. How about this - try creating a rule in a following pipeline stage like the below. This rule should only pop into the debug logs if request_url ending in png was missed by your rule

rule "PipelineLeak"
when
    NOT has_field("extra_data")    && 
    NOT has_field("normalRequest") &&
    ends_with("png",to_string($message.request_url)) 
then
    debug(concat("============Leaked request_url: ",to_string($message.request_url)));
end
1 Like

Yes,
The content of request_url field in pictures that I posted in Feb 23 is /assets/img/app-download-bg@3x.cefe823.png, and it is match regex: /([A-z0-9]+/)[A-z0-9]+([-.~_@][A-z0-9]+).(css|ico|jpg|js|json|map|png|svg|xml|woff2)(?[A-z0-9]{2,20}|?v=[0-9])?

As picture, the same content, some have normalRequest field, but some don’t

I will try the rule you provide.

Since you have a Graylog cluster - the Input is Global… throwing things out there to see if they stick…

@tmacgbay @gsmith
Thank for your help

During the debug, I see “unprocessed messages” is increasing, so:

  • I updated all package (apt update && apt upgrade), it include Graylog from 5.0.3 ~ 5.04
  • In order to lower messages in Process buffer, I added 2 core CPU and 4GB RAM to all GCE and adjusted JVM heap from 1GB to 2GB(both min and max)
  • I turn off OTX TI, and problem gone.

Now, every thing is back to work.

1 Like

I lost track of details in this long thread … could you briefly summarize what the root cause and fix was, so others may benefit?

@Kevin.Lin

awesome-yes-will-ferrell (1)

Glad we could help