Graylog 4 - pipeline regex causes lost messages?

Hi,

I’m pretty new to this pipeline thing but I’m seeing something that makes no sense to me, and I’d appreciate a little feedback.

Problem description:

TL;DR: A pipeline that enriches exim4 loglines matches rules and starts to process, but accessing a regex match set seems to result in some sort of abort, with lost messages as a result. Matching the rule in the simulator works just fine and shows no errors.

What I’m trying to parse:
2021-01-17 14:52:44 1l19Pz-0006T2-PI <= florian@senderdomain.tld H=84-245-8-134.dsl.random.hostname.tld (clevo) [12.16.153.14] P=smtp S=671

Pipeline rule (edited to show preformatted now):

rule "exim: ingress message"
when
  // Select for incoming SMTP connections
  has_field("message") AND contains(to_string($message.message)," <= ")
then
  // mark message as incoming
  set_field("eximsg_state","ingress");

  // identify sender email
  let sar = regex(".* <= (\\S{1,}@\\S{2,}\\.\\S{2,})", to_string($message.message));
  let sender = to_string(sar["0"]);
  set_field("eximsg_sender", sender);
end

If I throw this line at the Simulator it evaluates fine and the expected fields are added, and eximsg_sender does contain “florian@senderdomain.tld”

If I run this same scenario ‘live’ the entire message seems to get lost in the pipeline, actually searching for the message ID (‘1l19Pz-0006T2-PI’ in this example) yields no results for this log-line, but others in the same transaction do appear.

Uncommenting the line ‘set_field(“eximsg_sender”, sender);’ makes the message appear in the stream again, but then obviously the eximsg_sender field does not appear.

What I’ve tried

The problem seems to occur when I try to address the match set as an array. If I fill the field without the [“0”] suffix the field is populated with a json-like array "{“0”:“florian@senderdomain.tld”}. This leads me to believe there is something wrong with the way I’m accessing the match set.

BTW, I’ve found a similar approach in Issue with regex-array in pipeline-rule so this is why I’ve formulated it this way.

The setup

Is basically a docker stack with

  • Graylog 4.0.1
  • Elastic 7.10.0

Any hints or pointers are much appreciated.

Regards,
F.

I have found wonkiness when you aren’t escaping escapes such as \S Here is an example one I had to play with escapes to get to work properly (with formatting goodness to make it easier to read) Also of note… there are commented out debug() statements. If you include those, the results show in your Graylog server log (tail -f /var/log/graylog-server/server.log)

rule "PA-Firewall - ex2 - event_description"
when
    has_field("logtype")                            &&
    to_string($message.logtype) == "GLOBALPROTECT"  &&
    has_field("event_description")      
then
    let message     = to_string($message.event_description);

    let desc_parts   = regex(pattern: "^(?:\\w+\\s+){2}(.*)\\.\\s+(.+)", value: message);
    set_field("event_action", to_string(desc_parts["0"]));
    let desc_lowered    = replace(lowercase(to_string(desc_parts["1"]))," , ", ", ");    //might have extranious comma's
    let desc_cleaned    = regex_replace("\\b\\s+", desc_lowered , "_");                 //replace unwonted spaces 
    let keyed_up    = key_value(desc_cleaned,
                                ",",
                                ":",
                                true,
                                true,
                                "take_last",
                                " ",
                                " "
                    );

    //debug("$$$$---Event to be :");
    //debug(to_string(keyed_up));

    set_fields(keyed_up);

    set_field(field: "ra_tag", value: "globalprotect");
    route_to_stream(name: "Remote Access Global");
end

Thanks for that swift response. I’ve escaped the codes, but the escaping gets lost in this forum…

let sar = regex(".* <= (\\S{1,}@\\S{2,}\\.\\S{2,})", to_string($message.message));

In the mean time I’ve also found some errors in the Indexer log:

ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=object mapping for [eximsg_sender] tried to parse field [eximsg_sender] as object, but found a concrete value]]

So there’s likely a difference between handling this with a raw input string in de simulation and the syslog line that is coming in.

Any idea’s on how I can write handling for different types of variables? Is there a way to do if/then within the ‘when’ part?

Thanks,
F.

Replying to self here.

I’ve had a brainwave that the error message:

ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=object mapping for [eximsg_sender] tried to parse field [eximsg_sender] as object, but found a concrete value]]

might mean the field has now been associated with the wrong type (an object) the first time it ran, but it should have been a string.

Confirmed by simply changing the target variable name, and yes, now it just seems to work.

Now to find a way to actually fix that… Your thoughts are welcome.

F.

You can rotate the index and it will take the new type as long as the new type for that field is the first thing that gets inserted into the new index… You can create a custom index map to make sure.

If you want to correct old indexes - I made a post a while back about it here

Note that my post was doing it in an older version of elastic (6.8 I think) so take that into account.

Thanks, great writeup on correcting the old indexes. In my case just cycling to a new index and ditching the old one was acceptable, the setup is not in production anyway, but it’s good to learn there are some options if need be. Looks like a hell of a job though :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.