How to get only certain data from grok/regex?

Hi everyone,

I’m trying to extract data from UFW logs using regex/GROK but I can’t find how to get only the data I want. Maybe some of you could help me to see clear through these tools ?

This is the log I’m trying to parse (IPs are anonymized):

Aug  9 11:47:32 servername kernel: [1045486.294558] [UFW BLOCK] IN=ens000 OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:00:00 SRC=0.0.0.0 DST=0.0.0.0 LEN=52 TOS=0x00 PREC=0x00 TTL=128 ID=52948 DF PROTO=TCP SPT=23550 DPT=23 WINDOW=64240 RES=0x00 SYN URGP=0

And here is my custom grok pattern :

DPT=(?<data>[\d]+)

The result that comes is :confused:

{
  "UFW_DST_PORT": "DPT=23",
  "data": "23"
}

I made a pipeline to add new fields, the rule is here :

rule "UFW IN/OUT/PORT"

when

 has_field("log_application") && to_string($message.log_application) == "ufw"
  
then

 let msg = to_string($message.message);

 let ufw_source = grok(pattern:"%{UFW_IP_SRC}", value: msg, only_named_captures:true);
 let ufw_destination = grok(pattern:"%{UFW_IP_DST}", value: msg, only_named_captures:true);
 let ufw_port = grok(pattern:"%{UFW_DST_PORT}", value: msg, only_named_captures:true);
 let ufw_protocol = grok(pattern:"%{UFW_PROTO}", value: msg, only_named_captures:true);
 let ufw_action = grok(pattern:"%{UFW_ACTION}", value: msg, only_named_captures:true);
 
 set_field("ufw-source", ufw_source);
 set_field("ufw-destination", ufw_destination);
 set_field("ufw-port", ufw_port);
 set_field("ufw-protocol", ufw_protocol);
 set_field("ufw-action", ufw_action);
 
end

Which seems quite good, until I see the way it turns out into the logs ; just like this :confused:

image

Wht I want to do is have only the “TCP” part of the message and get rid of all the “data” and “{”“}” stuff. I don’t understand how to proceed in order to get this.

Thanks in advance for your help !

Changed the variable name up a bit to distinguish it… but this should work… defining the field name in the GROK since you are setting only_named_captures to be true.

 let var_ufw_source = grok(pattern:"%{UFW_IP_SRC:ufw_source}", value: msg, only_named_captures:true);
 set_fields(var_ufw_source);

On a side note for efficiencies sake, each one of those is a full GROK (regex) search down of the message… it may be more efficient to include all of those in one GROK statement, then using the set_fields()

Hi @tmacgbay,

Sorry, I’ve been unclear in what I explained.

Your solution works very well, but in my case I only want to extract the first group captured in my regex, just like on the picture below.

By default the GROK pattern seems to extracts the full match, but I don’t want it because I can’t use the parsed output correctly if it’s not the captured group that is used.

I tried to write directly my regex in the pipeline, but it reported lots of errors related to the “(){}”"; " symbols used in the regex, like on the pic below.

image

When written in a form a a new GROK pattern, the regex is fully working, except that I don’t find how to get the groups.

It’s possible to use GROK across an entire message and only capture out the things you want. When you use “Named Captures Only”, the set_fields() will only create the fields you have named and ignore things like UNWANTED. Here is an example captured form another post:

%{TIMESTAMP_ISO8601:logdate} - %{DATA:state}\n%{SPACE}Type:%{GREEDYDATA:UNWANTED}%{SPACE}Prepared:%{GREEDYDATA:UNWANTED}\n%{SPACE}QL:%{SPACE}%{GREEDYDATA:QL}\n%{SPACE}Result-Count:%{GREEDYDATA:UNWANTED}\n%{SPACE}Runtime:%{SPACE}%{GREEDYDATA:Runtime}

In the pipeline with regex, any time you escape a character you have to double escape it. Once for the pipeline, then for regex. so your pattern would be:

 let ufw_action = regex("\\[(UFW([\\w|\\s]+))\\]", msg);

I think that’s true for GROK as well…

2 Likes

Hi @tmacgbay,

Happy birthday and thanks for your reply. I rewrited a few GROK patterns following your example and used another structure in my pipeline, and it works great now.

I post what I did here, in case it could help anyone :wink:

My main grok pattern, used in the pipeline :

%{UFW_TIMESTAMP:ufw_timestamp}\s%{UFW_HOSTNAME:ufw_hostname}%{GREEDYDATA:UNWANTED}(\[[\w|\s]{4})%{UFW_ACTION:ufw_action}%{GREEDYDATA:UNWANTED}IN=%{UFW_INTERFACE:ufw_interface}%{GREEDYDATA:UNWANTED}SRC=%{IP:ufw_source}\sDST=%{IP:ufw_destination}%{GREEDYDATA:UNWANTED}PROTO=%{UFW_PROTO:ufw_protocol}%{GREEDYDATA:UNWANTED}DPT=%{UFW_DESTINATION_PORT:ufw_destination_port}

And my pipeline :

when
    has_field("log_application") && to_string($message.log_application) == "ufw"
then
    let msg=to_string($message.message);
    let parsed=grok(pattern: "%{UFW_ACTIONS}",value: msg,only_named_captures: true);
    set_fields(parsed);
end

Thanks!! :smiley:

Glad that worked out, you wrote it exactly as I would have!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.