RegEx Search with Whitespace

Good day,

Attempting perform a Graylog search with a RegEx and it appears to fail when including a whitespace in the query. (Similar to Regular expression with whitespace not work in search field)

We are attempting to create a query to parse the WAF logs from our load balancer to see if it is denying any clients internal to our LAN in addition to external traffic. All the information we require is contained within the full_message or message field. Specifically the WAF ID and client IP.

For example, presuming we were searching for WAF rule 920420 we have been attempting something similar to the following:

application_name:wafd AND full_message:id "920420" AND full_message:/(client\s)(192)\.(168)(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])){2}/

The RegEx seems to work if searching for just “client” or for just the internal IP but fails when a “.”, " ", or a “\s” is added in. Unfortunately the received syslog messages also includes the server’s IP address so the RegEx for the local subnet will always pop unless the search is limited to “client X.X.X.X”

If anyone has attempted something similar and have an example they could share or give me some ideas for other leads to chase down it would be appreciated.

Thank you,

2. Describe your environment:
GrayLog 4.2.5
Ubuntu 20.04.3 LTS

Hello && Welcome

I might be able to help.

You used this for a search.

application_name:wafd AND full_message:id "920420" AND full_message:/(client\s)(192)\.(168)(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])){2}/

In my lab-GL I successfully executed this command for a search

devname: 102-zzz-555 AND message:"0001000014" AND message: /10\.111\.56\..+/

Have you tried this?

application_name: wafd AND full_message:“920420” AND full_message: /198\.168\..+\..+/

Just an FYI I adjusted this post to make your search command easier to read. When you post Command/ Log files/ or configuration could you use the markup. It is easier to read, thanks

Hope that helps

Thanks for the modifications of my first post and appreciate condensing the IP format but unfortunately part of my issue is that the messages I am working with contains “client” that is the remote host and “hostname” that contains the local servers IP address so just searching for the local network will always generate a hit unless I add a second statement to exclude our server subnet.

Was trying to match for “client <192.168.X.X>” which is why part of my query had

(client\s)

so I wouldn’t pick up on the local server and potentially use it as a template with some quick tweaks when searching logs from other systems that have both the remote client and server IP address.

I presume if I can’t get the \s for whitespace or similar working in my search query I may have to try my hand at making an extractor for this device or defining our various server subnets but was hoping for something I could repurpose easily between sources.

You could pretty easily pick out the data you want into fields with a GROK statement either in an extractor or in the pipeline. I have a single pipeline rule that checks the src_ip against internal subnets. I try to make it generic so I can place it in whatever pipeline I want and I only have to maintain one. You could put these in a table as well so you can just maintain one test file. In my case if it is an internal IP it will pull the machine name form my DNS connected table as well as creat a new field with the “internal_ip” Hope this helps:

rule "Gnrl-correct-src_ip-internal"
when
    has_field("src_ip")                                         &&
    is_ip(to_ip($message.src_ip))                               &&
    (
        cidr_match("192.168.72.0/23",  to_ip($message.src_ip))  ||    //Internal Range 1
        cidr_match("192.168.50.0/24",  to_ip($message.src_ip))  ||    //Internal Range 2
        cidr_match("10.22.22.0/24",    to_ip($message.src_ip))  ||    //Internal Range 3
        cidr_match("10.105.125.0/24",  to_ip($message.src_ip))  ||    //Internal Range 4
        cidr_match("10.5.5.0/24",      to_ip($message.src_ip))  ||    //Internal Range 5
//        in_private_net(to_string($message.src_ip))            ||    //GRRR didn't work for me
//
//  	  external addresses that may mean connection was loop routed
//
      cidr_match("77.104.7.7/27",   to_ip($message.src_ip ))     ||
      cidr_match("77.5.8.8/28",     to_ip($message.src_ip ))     ||
      cidr_match("77.65.1.9/28",    to_ip($message.src_ip ))     ||
      cidr_match("77.120.220.9/28", to_ip($message.src_ip )) 
    )

then
    let IP2Name = lookup_value("DNS_table", $message.src_ip );
    set_field("internal_ip_machinename", IP2Name);  // local machine Name
    set_field("internal_ip", $message.src_ip);

end

Oh you have brackets around your IP Address.
You could either create extractors as you stated or use @tmacgbay suggestion with the pipeline.
Those would be your best bet that I know of.

@CC-LDU not sure if this is resolved for you or not, but I’m curious… are you currently extracting any of the fields in that message you have posted above? If not, consider at least extracting the some of the data either via pipeline or extractor.

Simple GROK pattern to extract the client would look like this:

[client %{IPV4:client}

for hostname

[hostname %{IPV4:hostname}

Then once you have those extracted you can do a query like this…

application_name:wafd AND client:192.168*

Thanks for everyone’s assistance we have set up an extractor on the Input as mentioned earlier.

In case someone has logs with similar formatting, we ended up using the following to pull the client IP for the field

\[client\s(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b)\]

Luckily the logs for these events all starts with, “(DeviceName) wafd” so we where able to do a “Only attempt extraction if field matches regular expression” with

DEVICE.\swafd
1 Like

Nice :slight_smile:

If that solved your issue could you mark it as Resolved.
This will help for future search if need be. Thanks

Glad you got it working. Just wanted to point something out that might not ever show up in your use case, but something I think you should be aware of.

Your regex expression allows for invalid IP addresses to be parsed. It accounts for 0.0.0.0 through 999.999.999.999. Again, in your case, that may never actually happen, but it would be better to use a regex built for valid IP addresses. The one listed in Graylog’s GROK patterns section would work.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.