Regex pipeline : Get IP from string field


#1

Hello everybody,

I’m trying to match IP Adress from URL fields :

rule "[WIN]Detect IP in URL"
when
    regex"^([0-9]{1,3}\.){3}[0-9]{1,3}$", to_ip($message.url_hostname)).matches == true
then
    set_field("ip_in_url","true");
end

The regx is valid. I checked it on regex101.com. However, Graylog doesn’t validate the regex.

Any idea on where is the issue ?

Thanks for your help !


(Jochen) #2

You’re missing an opening parenthesis after the regex function name.

The regular expression also only matches if the url_hostname field contains an IPv4 address without leading or trailing characters. Is that always the case with your messages?


#3

Parenthesis fixed thx:

rule "[WIN]Detect IP in URL"
when
    regex("^([0-9]{1,3}\.){3}[0-9]{1,3}$", to_ip($message.url_hostname)).matches == true
then
    set_field("ip_in_url","true");
end 

However, normally, url_hostname should NOT contains IPv4 address.
That’s why I try do detect with this rule when there is some ip adress instead of normal url in the hostname_url field.
Use IP and not dns resolution may be considered as suspicious.


(Jochen) #4

Please post an example message which should be matched by this rule.


#5

That’s pretty simple, instead of dns name, the proxy client request an external ip addres.

Normal behavior :
url_hostname:google.com

Suspicious behavior that I try to detect using the pipeline :
url_hostname:1.2.3.4
(or url_hostname: 1.2.3.4:6666)


(Jochen) #6

Are there any trailing whitespace characters in the url_hostname field?

Also, your regular expression will only match the first example ( 1.2.3.4) but not the second (1.2.3.4:6666).

Additionally, I just see the to_ip function in your condition, which is wrong there. Regular expressions only work on strings, not IP addresses (the data type, not if a string contains an IP address).


#7

No trailing whitespace characters in the url_hostname field

Yes, this is just a test, I will improve to detect 1.2.3.4:6666

I tried :

rule "[WIN]Detect IP in URL"
when
regex("^([0-9]{1,3}\.){3}[0-9]{1,3}$", to_string($message.url_hostname)).matches == true
then
set_field("ip_in_url","true");
end

But Graylog does not validate the regex …

Do you think if what I want to do is possible ?


(Jochen) #8

You have to escape the \ character (to \\).

Also see http://www.regexplanet.com/advanced/java/index.html and http://www.vogella.com/tutorials/JavaRegularExpressions/article.html#backslashes-in-java


#9
  1. Why do you use ^ and $ in your regex?
  2. Why do you not include the last set of numbers in the IP address in the capturing group?

#10

Indeed, no more graylog errors when I escape \ character. I taught escaping was not needed for regex… Thanks for this @jochen ! :smile:


#11

1-I always use this.
2-Otherwise, another dot (.) would be expected at the end of the last set of numbers : 1.2.3.4.


(Jochen) #12

As a rule of thumb, making a regular expression as specific and rigid as possible has advantages regarding performance and resilience.


(system) #13

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.