Regex pipeline : Get IP from string field

Hello everybody,

I’m trying to match IP Adress from URL fields :

rule "[WIN]Detect IP in URL"
when
    regex"^([0-9]{1,3}\.){3}[0-9]{1,3}$", to_ip($message.url_hostname)).matches == true
then
    set_field("ip_in_url","true");
end

The regx is valid. I checked it on regex101.com. However, Graylog doesn’t validate the regex.

Any idea on where is the issue ?

Thanks for your help !

You’re missing an opening parenthesis after the regex function name.

The regular expression also only matches if the url_hostname field contains an IPv4 address without leading or trailing characters. Is that always the case with your messages?

Parenthesis fixed thx:

rule "[WIN]Detect IP in URL"
when
    regex("^([0-9]{1,3}\.){3}[0-9]{1,3}$", to_ip($message.url_hostname)).matches == true
then
    set_field("ip_in_url","true");
end 

However, normally, url_hostname should NOT contains IPv4 address.
That’s why I try do detect with this rule when there is some ip adress instead of normal url in the hostname_url field.
Use IP and not dns resolution may be considered as suspicious.

Please post an example message which should be matched by this rule.

That’s pretty simple, instead of dns name, the proxy client request an external ip addres.

Normal behavior :
url_hostname:google.com

Suspicious behavior that I try to detect using the pipeline :
url_hostname:1.2.3.4
(or url_hostname: 1.2.3.4:6666)

Are there any trailing whitespace characters in the url_hostname field?

Also, your regular expression will only match the first example ( 1.2.3.4) but not the second (1.2.3.4:6666).

Additionally, I just see the to_ip function in your condition, which is wrong there. Regular expressions only work on strings, not IP addresses (the data type, not if a string contains an IP address).

No trailing whitespace characters in the url_hostname field

Yes, this is just a test, I will improve to detect 1.2.3.4:6666

I tried :

rule "[WIN]Detect IP in URL"
when
regex("^([0-9]{1,3}\.){3}[0-9]{1,3}$", to_string($message.url_hostname)).matches == true
then
set_field("ip_in_url","true");
end

But Graylog does not validate the regex …

Do you think if what I want to do is possible ?

You have to escape the \ character (to \\).

Also see http://www.regexplanet.com/advanced/java/index.html and http://www.vogella.com/tutorials/JavaRegularExpressions/article.html#backslashes-in-java

  1. Why do you use ^ and $ in your regex?
  2. Why do you not include the last set of numbers in the IP address in the capturing group?

Indeed, no more graylog errors when I escape \ character. I taught escaping was not needed for regex… Thanks for this @jochen ! :smile:

1-I always use this.
2-Otherwise, another dot (.) would be expected at the end of the last set of numbers : 1.2.3.4.

As a rule of thumb, making a regular expression as specific and rigid as possible has advantages regarding performance and resilience.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.