Regex Matching in Pipeline or Extractor

Hi there,

I am running Graylog 4.3.3 and I am running into issues with adding a field to a log. I have tried both extraction and a pipeline using 2 different regex patterns.

When testing input extraction, I can test both regex patterns and get the "Matches!’ response yet the field does not get created. I tried moving this workflow to a pipeline rule instead and I get the same result. I can verify I get matches in the pipeline rule yet no field gets created there either.

First attempt:

rule "ESXi Verbosity Matching Hostd"

when
  regex("\\bHostd\\b",to_string($message.message)).matches==true

then
  let verbosity = regex("^.*Hostd: \\s*(\\w+)",to_string($message.message));

set_field("log_verbosity", verbosity);

end

Second attempt:

rule "ESXi Verbosity Matching Hostd"

when
  regex("\\bHostd\\b",to_string($message.message)).matches==true

then
  let verbosity = regex("Hostd:\\W*(\\w+)",to_string($message.message));

set_field("log_verbosity", verbosity);

end
1 Like

Here is an example log:

herpa.derpa.corpo.lab Hostd: verbose hostd[2103629] [Originator@6876 sub=Libs opID=laukhmm4-1205666-auto-puar-h5:70342567-86-01-58-fd8c] VigorTransport_ClientSendRequest: opID=laukhmm4-1205666-auto-puar-h5:70342567-86-01-58-fd8c seq=536204665: Sending GuestStats.SetNotificationTime request.

Hey @bluescreenofwin

I uploaded you log file to my lab. did a quick regex check/configuration.
here is my findings.
Regex extractor config.

results.

That being shown, I think you regex is not configured correctly.

Looks like your regex is hunting for the second item (or in the case of regex starting at ‘0’, the 1st item) so tack a ["1"] on the end…

rule "ESXi Verbosity Matching Hostd"

when
    regex("\\bHostd\\b",to_string($message.message)).matches==true
then

    let verbosity = regex("^.*Hostd: \\s*(\\w+)",to_string($message.message))["1"];
    set_field("log_verbosity", verbosity);

end
1 Like

hey @tmacgbay

I said" Self, give it a try" so Ive been working on this, for some reason i cant even get this to work.

Check System/Configurations was correct.
Check using different “Stages”

What I have done also was us this piepline.

rule "batman"
when
    regex("\\bHostd\\b",to_string($message.message)).matches==true
then

    let verbosity = regex("^.*Hostd: \\s*(\\w+)",to_string($message.message))["1"];
    set_field("log_verbosity", verbosity);
    debug(verbosity);
end

Results:

2023-01-09 21:27:48,997 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 21:27:49,153 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 21:27:50,920 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 21:27:50,920 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 21:27:50,920 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 21:32:11,366 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-09 21:32:11,367 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-09 21:32:11,374 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-09 21:32:11,378 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-09 21:32:11,376 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-09 21:32:11,374 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-09 21:32:11,382 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-09 21:32:11,382 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.

I used this piepline earlier:

rule "ESXi Verbosity Matching Hostd"
when
  has_field("message") 
then
  let verbosity = regex("\\[(.*?)]\\)",to_string($message.message));
  set_field("httpd",verbosity["1"]); <--- I also used  "0" for testing purposes.
  debug(verbosity);
end

Results:

2023-01-09 20:48:19,929 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:19,929 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:19,929 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:28,034 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:28,034 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:29,681 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:29,681 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:29,682 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:29,683 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:30,041 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:30,041 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:30,041 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:30,065 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:30,503 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:31,127 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:32,837 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:32,838 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:32,838 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:39,774 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:39,774 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:39,776 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:39,776 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:43,036 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:43,036 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:43,039 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:43,039 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-09 20:48:43,040 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}

Now Im over here trying to get it to work :laughing:

Hmmm… There is one “Match” and one “Group” that shows in regex101. I had thought since the Group was after the match that it would naturally pick it up.

You can swap the grouping around and make the first part a Positive Look-Behind (?<=...)… Which in reality is what you want… the thing that follows Hostd:… then grab the first thing with zero (?) maybe you don’t need that… …this way there is ONLY a match to work with. A regex like such:

(?<=Hostd:\\s)\\S+

Which would make the rule:

rule "ESXi Verbosity Matching Hostd"

when
    regex("\\bHostd\\b",to_string($message.message)).matches==true
then

    let verbosity = regex("(?<=Hostd:\\s)\\S+",to_string($message.message))["0"];
    set_field("log_verbosity", verbosity);

end

Hey,

I gave it a try, its a "no Go’. I’m wondering if it’s the log file. Dont mind the ERROR, I’ve been try different Input/type out to see if it makes a difference.

Results:

2023-01-10 16:43:43,886 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-10 16:43:43,887 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-10 16:43:43,887 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-10 16:43:43,887 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-10 16:43:43,888 ERROR: org.graylog2.shared.buffers.processors.DecodingProcessor - Unable to decode raw message RawMessage{id=3c9383d0-9138-11ed-a56e-0242ac110004, messageQueueId=8623604, codec=gelf, payloadSize=153, timestamp=2023-01-10T22:43:43.885Z, remoteAddress=/10.10.10.10:55136} on input <62db5ffc81f9e61ab57272bc>.
2023-01-10 16:43:43,888 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-10 16:43:43,888 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-10 16:43:43,889 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-10 16:43:43,890 ERROR: org.graylog2.shared.buffers.processors.DecodingProcessor - Error processing message RawMessage{id=3c9383d0-9138-11ed-a56e-0242ac110004, messageQueueId=8623604, codec=gelf, payloadSize=153, timestamp=2023-01-10T22:43:43.885Z, remoteAddress=/10.10.10.10:55136}
java.lang.IllegalArgumentException: GELF message <3c9383d0-9138-11ed-a56e-0242ac110004> (received from <10.10.10.10:55136>) has empty mandatory "short_message" field.
        at org.graylog2.inputs.codecs.GelfCodec.validateGELFMessage(GelfCodec.java:263) ~[graylog.jar:?]
        at org.graylog2.inputs.codecs.GelfCodec.decode(GelfCodec.java:141) ~[graylog.jar:?]
        at org.graylog2.shared.buffers.processors.DecodingProcessor.processMessage(DecodingProcessor.java:156) ~[graylog.jar:?]
        at org.graylog2.shared.buffers.processors.DecodingProcessor.onEvent(DecodingProcessor.java:94) [graylog.jar:?]
        at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:95) [graylog.jar:?]
        at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:49) [graylog.jar:?]
        at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [graylog.jar:?]
        at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
        at java.lang.Thread.run(Unknown Source) [?:?]
2023-01-10 16:43:43,891 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.

Not sure whats up, I do know my Regex extractor above works.

What is $message.message coming in as?

The test message is:
herpa.derpa.corpo.lab Hostd: verbose hostd[2103629] [Originator@6876 sub=Libs opID=laukhmm4-1205666-auto-puar-h5:70342567-86-01-58-fd8c] VigorTransport_ClientSendRequest: opID=laukhmm4-1205666-auto-puar-h5:70342567-86-01-58-fd8c seq=536204665: Sending GuestStats.SetNotificationTime request.

Here it is in regex101: regex101: build, test, and debug regex

To see the message coming in…

debug(concat("++++message :",$message.message));
debug(concat("++++verbosity:",verbosity));

Maybe take the ["0"] out to see results

I uploaded the example logs from this post.

message
2023-01-09 20:36:09 herpa.derpa.corpo.lab Hostd: verbose hostd[2103629] [Originator@6876 sub=Libs opID=laukhmm4-1205666-auto-puar-h5:70342567-86-01-58-fd8c] VigorTransport_ClientSendRequest: opID=laukhmm4-1205666-auto-puar-h5:70342567-86-01-58-fd8c seq=536204665: Sending GuestStats.SetNotificationTime request.

Ill give that a try, kind of confused why I cant get it to work. I’m using GL Operations 4.4.x with Elasticsearch 7.10, MongoDb 4.4

@tmacgbay

Lab testing results;
Log file results:
debug(concat(“++++message :”,$message.message));


2023-01-10 19:13:31,930 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: message2023-01-09 20:36:09 herpa.derpa.corpo.lab Hostd: verbose hostd[2103629] [Originator@6876 sub=Libs opID=laukhmm4-1205666-auto-puar-h5:70342567-86-01-58-fd8c] VigorTransport_ClientSendRequest: opID=laukhmm4-1205666-auto-puar-h5:70342567-86-01-58-fd8c seq=536204665: Sending GuestStats.SetNotificationTime request.
2023-01-10 19:13:31,930 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: message2023-01-09 20:36:09 herpa.derpa.corpo.lab Hostd: verbose hostd[2103629] [Originator@6876 sub=Libs opID=laukhmm4-1205666-auto-puar-h5:70342567-86-01-58-fd8c] VigorTransport_ClientSendRequest: opID=laukhmm4-1205666-auto-puar-h5:70342567-86-01-58-fd8c seq=536204665: Sending GuestStats.SetNotificationTime request.

debug(concat(“++++verbosity:”,verbosity));

2023-01-10 19:15:47,973 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-10 19:15:47,973 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-10 19:15:47,973 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-10 19:15:47,974 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: log_verbosity{}
2023-01-10 19:15:47,973 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-10 19:15:47,974 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: log_verbosity{}
2023-01-10 19:15:47,974 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: log_verbosity{}
2023-01-10 19:15:47,974 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-10 19:15:47,974 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: log_verbosity{}
2023-01-10 19:15:47,975 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-10 19:15:47,975 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: log_verbosity{}
2023-01-10 19:15:47,974 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: {}
2023-01-10 19:15:47,975 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: log_verbosity{}

GUI results:

EDIT: those tests above I have removed ["0"].

EDIT2: This is with the ["0"].

2023-01-10 19:29:32,005 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-10 19:29:32,005 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: log_verbosity
2023-01-10 19:29:32,006 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-10 19:29:32,006 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: log_verbosity
2023-01-10 19:29:32,007 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-10 19:29:32,007 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: log_verbosity
2023-01-10 19:29:32,007 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-10 19:29:32,007 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-10 19:29:32,008 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: log_verbosity
2023-01-10 19:29:32,008 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Passed value is NULL.
2023-01-10 19:29:32,008 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: log_verbosity
2023-01-10 19:29:32,008 INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: log_verbosity

EDIT2.0:

Ok so something new. The difference between an Extracotor and Pipeline minus the ["0"] with the same logs/messages. :thinking:

I used this REGEX for the extractor.

Hostd+\:+(\D+)\s
It can also work with Hostd:(\D+)\s

regex("(Hostd:\\s)(\\S+)",to_string($message.message))["1"];

Tested.

The bones were there. :expressionless:

1 Like

Still catching up on the thread. Appreciate the follow-up gentlefolk.

regex("(Hostd:\\s)(\\S+)",to_string($message.message))["1"];

I’ll throw this into my pipeline and test. Think seperating the different elements was the difference?

@tmacgbay Guess what I forgot :smile: during my extensive testing. the field " log_verbosity" was already create so it was throwing errors, I used a new field calle " hostd",

Awesome and thanks @tmacgbay ( AKA “Batman”)

1 Like

The (...) is a capturing group which is different than regex matching… A capturing group shows that you specifically want that data returned. I am not exactly clear on how Graylog/java handles regex in this case… you would think what you originally had would have worked… smarter minds than me could explain why. :\

Appreciate all the hard work @tmacgbay and @gsmith. Funny enough, the extractor version of the above started working randomly today, but only for some logs, and I was not able to get any of the above to work within the pipelines. I decided just to move the workload to rsyslogd and call it a day. Give that I have another thread open with a similar issue right now… Graylog seems to be kind of buggy when it comes to pipelines/rules and has very questionable regex adoption. :confused:

1 Like

I hate regex as much as I love Grok. Why don’t give a try with Groks?
Grokpattern needed beforehand:

DATA_ALL_BUT_SPACE 	[^ ]+
DATA_ALL_BUT_CLOSED_CORNERED_BRACKET 	[^]]+
DATA_ALL_BUT_OPENED_CORNERED_BRACKET	[^[]+

Now build a grok saved as myMagic using those to get the string done:
%{DATA_ALL_BUT_SPACE:source} Hostd: %{DATA_ALL_BUT_OPENED_CORNERED_BRACKET:host}]%{DATA_ALL_BUT_CLOSED_CORNERED_BRACKET:some_id}]

and we have those fields:

{
  "source": "herpa.derpa.corpo.lab",
  "host": "verbose hostd",
  "some_id": "2103629"
}

And not build a pipeline parsing it:

rule "parse_myMagic"
when
  //what ever reason applies
then
  set_fields(
    grok(
      pattern:"^%{myMagic}",
      value:to_string($message.message),
      only_named_captures:true
    )
  );
end

and you should be done.

2 Likes

Hi @bluescreenofwin
did you try my suggestion? I’d be happy to hear if it’s working for you.

Hi @ihe,

I’ll give it a shot and report back.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.