Regex Search Issues

So I have logs coming in from pfSense and I have extractors working as expected to break out the firewall logs.
The issue I am running into is trying to run a regex search on a particular field. So, I can search the following just fine:

SourceIP:10.27.200.253 OR SourceIP:10.27.204.253 OR SourceIP:10.27.200.252 OR SourceIP:10.27.204.252

So, basically I have two gateways on each of my many subnets, so instead of listing every possible combination of subnet/gateway, I thought why not use regex.
So, using RegEx101.com to validate, I came up with the following regex to match on any of my two gateways on all subnets:

^(?:([0-9]{1,3}\.){3}(25[2-3]{1}))

So, in scouring on how to use regex in a search string in Graylog, I basically came up with having to ā€œescapeā€ the regex inside a pair of forward slashes, resulting in the following search string:

SourceIP:/^(?:([0-9]{1,3}\.){3}(25[2-3]{1}))/

Unfortunately I get nothing back from that. Reading a little more, it looks like certain characters need to be escaped, even though I didnā€™t see anything stating added escaping in between the forward slashes, I tried it anyway:

SourceIP:/\^\(\?\:\(\[0-9\]\{1,3\}\.\)\{3\}\(25\[2-3\]\{1\}\)\)/

Again, nothing. So, knowing the regex itself is correct, how do I get this to function in Graylog?

Graylog version: Graylog v4.2.6+0210617
OS: Ubuntu 20.04

Thanks in advance.

You may be over-escaping. Anything you are escaping, double escape that. So for a period that would normally escape with \. escape it with twoā€¦ \\.

I tried this

SourceIP:/(?:([0-9]{1,3}\\.){3}(25[2-3]{1}))/

Still nothing.

Hmmm - what we should be focusing on is getting the right regex to track down IPā€™s. :slight_smile: There are plenty of people who have played around with that, I found a good page that goes into some length about it hereā€¦who doesnā€™t like Oā€™Reillyā€¦ One of the ā€œaccurateā€ regex expressions they had for tracking down IPv4 is (slightly modified to not be JUST an IP):

(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

(regex101 will translate that FAR better than I canā€¦ I loose track half way throughā€¦ haha)

No time to connect over to work/test environment, but that is something to get started onā€¦ escaping characters in Graylog is a little funky but at least iteration/experimentation is quick.

Thatā€™s whatā€™s frustrating, the regex Iā€™m using validates perfectly in Regex101.com to match the three host IPs regardless of the subnet. Now o e thing Iā€™m wondering is, does the results only return the first match group? If so, thatā€™s possibly the problem because Iā€™m doing a non-capture group, as is your example. Once I get home I will try to rewrite it to perform a look back and return a single group.

Sincerely,
Jody L Whitlock

One thing I forgot to mention is that I did try this last night:

SourceIP:/10.27.200.253/

This works, so pretty sure I donā€™t have to double escape the backslash, so thatā€™s what leads me to the capture group piece.

Sincerely,
Jody L Whitlock

The regex that Graylog uses is actually from Elasticsearch regexā€¦ which is is Lucene. Worth reading through as itā€™s not ā€œperl-compatableā€

Not sure why there has to be so many different ā€œflavorsā€ of regex, why
canā€™t we just standardize on oneā€¦

So somethingā€™s defenitely strange on the way Graylog/Elasticsearch does
things because I get many different combinations to validate on
different regex validators, including https://regex101.com, but the
same string does not work in Graylog.

For instance, hereā€™s the dataset I am testing with:

10.27.200.252
10.27.202.252
10.27.200.11
10.27.204.58
10.27.204.252
10.27.200.253
10.27.202.253
10.27.204.253

I do not want the third or fourth to match, but the rest to match.
https://regex101.com, this works perfectly and returns a single match
and single group for each line that fits the criteria:

((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[2-3]?))

But if I put this into Graylog search as:

SourceIP:/((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[2-
3]?))/

I get nothing. The docs I can find on Graylog and regex donā€™t offer
too much assistance, and the docs on Elassticsearch and regex are not
really that helpful at all so far.

Ok, so this works:

SourceIP:/((10)\.(27)\.20[0-9]{1}\.25[2-3]{1})/

Also, this seems to work, for a more broad stroke:

SourceIP:/(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(25[2-3]{1}))/

What I think is going on is, in the Lucene documentation states this, once:

. ? + * | { } [ ] ( ) " \

So that got me thinking, maybe the non-capturing portion of the regex ā€œ(?:ā€¦)ā€ was what was breaking things. Iā€™m going to have to play some moreā€¦

This is the final search I came up. Itā€™s limited to the specific stream, and first pulls all records with the SourceIP field, then makes sure itā€™s only my internal subnets (all are 10.27.x.x), then excludes the gateway IPs that I use (10.27.x.250-254)

(_exists_:SourceIP AND SourceIP:/10\.27\.([1-2]?[0-9][0-9]?){1}.([1-2]?[0-9][0-9]?){1}/) AND !(SourceIP:/((([1-2]?[0-9][0-9]?).){3}(25[0-4]))/)

So no I have a query of all my internal clients, if they were passed or blocked, and what rule was applied. This is step 1 in my journey with Graylog, many more to come!

Thanks again, believe it or not, you did get me in the right direction, many thanks for that!

1 Like

Thatā€™s a pretty cool search! Glad I could help - I am still learning regex as well. Mark your answer as the solution for future people who are searching for regex solutions so they can find it easier. :smiley:

1 Like

Just chiming in

I was just using regex101 and was think the same thing :laughing:

1 Like

And there should be one flavor of Linux tooā€¦ :crazy_face: :smiley:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.