Regex Search Issues

So I have logs coming in from pfSense and I have extractors working as expected to break out the firewall logs.
The issue I am running into is trying to run a regex search on a particular field. So, I can search the following just fine:

SourceIP:10.27.200.253 OR SourceIP:10.27.204.253 OR SourceIP:10.27.200.252 OR SourceIP:10.27.204.252

So, basically I have two gateways on each of my many subnets, so instead of listing every possible combination of subnet/gateway, I thought why not use regex.
So, using RegEx101.com to validate, I came up with the following regex to match on any of my two gateways on all subnets:

^(?:([0-9]{1,3}\.){3}(25[2-3]{1}))

So, in scouring on how to use regex in a search string in Graylog, I basically came up with having to “escape” the regex inside a pair of forward slashes, resulting in the following search string:

SourceIP:/^(?:([0-9]{1,3}\.){3}(25[2-3]{1}))/

Unfortunately I get nothing back from that. Reading a little more, it looks like certain characters need to be escaped, even though I didn’t see anything stating added escaping in between the forward slashes, I tried it anyway:

SourceIP:/\^\(\?\:\(\[0-9\]\{1,3\}\.\)\{3\}\(25\[2-3\]\{1\}\)\)/

Again, nothing. So, knowing the regex itself is correct, how do I get this to function in Graylog?

Graylog version: Graylog v4.2.6+0210617
OS: Ubuntu 20.04

Thanks in advance.

You may be over-escaping. Anything you are escaping, double escape that. So for a period that would normally escape with \. escape it with two… \\.

I tried this

SourceIP:/(?:([0-9]{1,3}\\.){3}(25[2-3]{1}))/

Still nothing.

Hmmm - what we should be focusing on is getting the right regex to track down IP’s. :slight_smile: There are plenty of people who have played around with that, I found a good page that goes into some length about it here…who doesn’t like O’Reilly… One of the “accurate” regex expressions they had for tracking down IPv4 is (slightly modified to not be JUST an IP):

(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

(regex101 will translate that FAR better than I can… I loose track half way through… haha)

No time to connect over to work/test environment, but that is something to get started on… escaping characters in Graylog is a little funky but at least iteration/experimentation is quick.

That’s what’s frustrating, the regex I’m using validates perfectly in Regex101.com to match the three host IPs regardless of the subnet. Now o e thing I’m wondering is, does the results only return the first match group? If so, that’s possibly the problem because I’m doing a non-capture group, as is your example. Once I get home I will try to rewrite it to perform a look back and return a single group.

Sincerely,
Jody L Whitlock

One thing I forgot to mention is that I did try this last night:

SourceIP:/10.27.200.253/

This works, so pretty sure I don’t have to double escape the backslash, so that’s what leads me to the capture group piece.

Sincerely,
Jody L Whitlock

The regex that Graylog uses is actually from Elasticsearch regex… which is is Lucene. Worth reading through as it’s not “perl-compatable”

Not sure why there has to be so many different “flavors” of regex, why
can’t we just standardize on one…

So something’s defenitely strange on the way Graylog/Elasticsearch does
things because I get many different combinations to validate on
different regex validators, including https://regex101.com, but the
same string does not work in Graylog.

For instance, here’s the dataset I am testing with:

10.27.200.252
10.27.202.252
10.27.200.11
10.27.204.58
10.27.204.252
10.27.200.253
10.27.202.253
10.27.204.253

I do not want the third or fourth to match, but the rest to match.
https://regex101.com, this works perfectly and returns a single match
and single group for each line that fits the criteria:

((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[2-3]?))

But if I put this into Graylog search as:

SourceIP:/((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[2-
3]?))/

I get nothing. The docs I can find on Graylog and regex don’t offer
too much assistance, and the docs on Elassticsearch and regex are not
really that helpful at all so far.

Ok, so this works:

SourceIP:/((10)\.(27)\.20[0-9]{1}\.25[2-3]{1})/

Also, this seems to work, for a more broad stroke:

SourceIP:/(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(25[2-3]{1}))/

What I think is going on is, in the Lucene documentation states this, once:

. ? + * | { } [ ] ( ) " \

So that got me thinking, maybe the non-capturing portion of the regex “(?:…)” was what was breaking things. I’m going to have to play some more…

This is the final search I came up. It’s limited to the specific stream, and first pulls all records with the SourceIP field, then makes sure it’s only my internal subnets (all are 10.27.x.x), then excludes the gateway IPs that I use (10.27.x.250-254)

(_exists_:SourceIP AND SourceIP:/10\.27\.([1-2]?[0-9][0-9]?){1}.([1-2]?[0-9][0-9]?){1}/) AND !(SourceIP:/((([1-2]?[0-9][0-9]?).){3}(25[0-4]))/)

So no I have a query of all my internal clients, if they were passed or blocked, and what rule was applied. This is step 1 in my journey with Graylog, many more to come!

Thanks again, believe it or not, you did get me in the right direction, many thanks for that!

1 Like

That’s a pretty cool search! Glad I could help - I am still learning regex as well. Mark your answer as the solution for future people who are searching for regex solutions so they can find it easier. :smiley:

1 Like

Just chiming in

I was just using regex101 and was think the same thing :laughing:

1 Like

And there should be one flavor of Linux too… :crazy_face: :smiley:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.