Grok performance problem (max length ?)

Hi all

I have an issue, but I can’t find any whys…

So I have a message from a loadbalancer:

lb 2020-02-24 13:38:46.009 +0100  lb TR 192.168.3.4 443 1.2.3.4 51476 "-" "-" POST TLSv1.2 do.main.com HTTP/1.1 200 219 973 0 2 2.3.4.5 443 2 "-" INTERNAL DEFAULT PASSIVE INVALID /url.html "Browser's data" 1.2.3.4 51476 "-" "-" "-" "-"

We have created a GROK for it:

%{DATE_Y_M_D:date} %{TIME:time} %{NUMBER:tzone}  %{WORD:unit-name} %{WORD:log-type} %{IPV4:service-ip} %{NUMBER:service-port} %{IPV4:client-ip} %{NUMBER:client-port} %{QUOTEDSTRING:login-id} %{QUOTEDSTRING:cert-user} %{WORD:method} %{DATA:proto} %{DATA:host} %{DATA:version} %{NUMBER:http-status} %{NUMBER:byte-sent} %{NUMBER:byte-rec} %{NUMBER:cache-hit} %{NUMBER:time-taken} %{IPV4:service-ip} %{NUMBER:service-port} %{NUMBER:server-time} %{QUOTEDSTRING:session-id} %{WORD:response-type} %{WORD:profile-match} %{WORD:protected} %{WORD:wf-match} %{DATA:url} %{QUOTEDSTRING:query-str} %{QUOTEDSTRING:referer} %{QUOTEDSTRING:cookie} %{QUOTEDSTRING:user-agent} %{IPV4:proxy-ip} %{NUMBER:proxy-port} %{QUOTEDSTRING:auth-user} %{QUOTEDSTRING:c-header1} %{QUOTEDSTRING:c-header2} %{QUOTEDSTRING:c-header3}

The problem:
The Graylog process it very slow. About 100 msg/s (HW config not releated). (with out any problem, so it extracts all fields well.)

If I remove the end of the GROK (everything after “user-agent”), it works fast. About 5-10k msg/s.

%{DATE_Y_M_D:date} %{TIME:time} %{NUMBER:tzone}  %{WORD:unit-name} %{WORD:log-type} %{IPV4:service-ip} %{NUMBER:service-port} %{IPV4:client-ip} %{NUMBER:client-port} %{QUOTEDSTRING:login-id} %{QUOTEDSTRING:cert-user} %{WORD:method} %{DATA:proto} %{DATA:host} %{DATA:version} %{NUMBER:http-status} %{NUMBER:byte-sent} %{NUMBER:byte-rec} %{NUMBER:cache-hit} %{NUMBER:time-taken} %{IPV4:service-ip} %{NUMBER:service-port} %{NUMBER:server-time} %{QUOTEDSTRING:session-id} %{WORD:response-type} %{WORD:profile-match} %{WORD:protected} %{WORD:wf-match} %{DATA:url} %{QUOTEDSTRING:query-str} %{QUOTEDSTRING:referer} %{QUOTEDSTRING:cookie} %{QUOTEDSTRING:user-agent}

Any idea what can causes this behavior?
If I add only the IPV4 address, it will slow again.
It can use IPV4 address GROK, because we use it in the patterns without problem. But in this position It can’t solve it fast.

I also tried to check it in an online tester, but it shows it can process until this part too.
The “smaller” GROK is enough for us (the proxy ip and port are the same with client ip…), but it could be good to know the reason.

what is the pattern for DATE_Y_M_D ?

When I try to resolve the pattern to the regex that will be used - it looks ‘complicated’

grokoff "%{DATE_Y_M_D:date} %{TIME:time} %{NUMBER:tzone}  %{WORD:unit-name} %{WORD:log-type} %{IPV4:service-ip} %{NUMBER:service-port} %{IPV4:client-ip} %{NUMBER:client-port} %{QUOTEDSTRING:login-id} %{QUOTEDSTRING:cert-user} %{WORD:method} %{DATA:proto} %{DATA:host} %{DATA:version} %{NUMBER:http-status} %{NUMBER:byte-sent} %{NUMBER:byte-rec} %{NUMBER:cache-hit} %{NUMBER:time-taken} %{IPV4:service-ip} %{NUMBER:service-port} %{NUMBER:server-time} %{QUOTEDSTRING:session-id} %{WORD:response-type} %{WORD:profile-match} %{WORD:protected} %{WORD:wf-match} %{DATA:url} %{QUOTEDSTRING:query-str} %{QUOTEDSTRING:referer} %{QUOTEDSTRING:cookie} %{QUOTEDSTRING:user-agent} %{IPV4:proxy-ip} %{NUMBER:proxy-port} %{QUOTEDSTRING:auth-user} %{QUOTEDSTRING:c-header1} %{QUOTEDSTRING:c-header2} %{QUOTEDSTRING:c-header3}"


%{DATE_Y_M_D:date} (?<time>(?!<[0-9])(?:2[0123]|[01]?[0-9]):(?:[0-5][0-9])(?::(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?))(?![0-9])) (?<tzone>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))  (?<unit-name>\b\w+\b) (?<log-type>\b\w+\b) (?<service-ip>(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])) (?<service-port>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<client-ip>(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])) (?<client-port>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<login-id>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))) (?<cert-user>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))) (?<method>\b\w+\b) (?<proto>.*?) (?<host>.*?) (?<version>.*?) (?<http-status>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<byte-sent>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<byte-rec>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<cache-hit>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<time-taken>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<service-ip>(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])) (?<service-port>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<server-time>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<session-id>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))) (?<response-type>\b\w+\b) (?<profile-match>\b\w+\b) (?<protected>\b\w+\b) (?<wf-match>\b\w+\b) (?<url>.*?) (?<query-str>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))) (?<referer>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))) (?<cookie>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))) (?<user-agent>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))) (?<proxy-ip>(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])) (?<proxy-port>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<auth-user>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))) (?<c-header1>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))) (?<c-header2>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))) (?<c-header3>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``)))

Warning! Unable to match the following expressions:
  DATE_Y_M_D
This could be a typo or a missing grok pattern file.

That regex result is super complicated and in other terms spoken: multiple nested ifs in multiple nested ifs.

The recursion on every string (messages) with all possible options is what kills your performance. It might be not explained accurate - but it is simple to complex.

DATE_Y_M_D just %{YEAR}[,/-]%{MONTHNUM}[./-]%{MONTHDAY}

Yes, I know it is long, and complicated, but I don’t know why the last IPV4, or anything after “user-agent” cause this performance problem.
Also if I change some part of the code for an easier on (eg. %{QUOTEDSTRING:session-id} to “-”; %{WORD:profile-match} to DEFAULT, etc because it’s same value all time …) It doesn’t help.

So I just hope someone saw something similar, and has some reason for it, instead of “complicated thing” explanation.

//regexp is not a complicated thing if you spend a hour to learn about it, and you have a notepad, and put some line breaks in the lines. :slight_smile:

//regexp is not a complicated thing if you spend a hour to learn about it, and you have a notepad, and put some line breaks in the lines.

I need to be a little more specific … you need real cpu cycle to solve them.

The Regex that runs on every line you ingest is the following:

(?>\d\d){1,2},/-./- (?(?!<[0-9])(?:2[0123]|[01]?[0-9]):(?:[0-5][0-9])(?::(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?))(?![0-9])) (?(?:(?<![0-9.±])(?>[±]?(?:(?:[0-9]+(?:.[0-9]+)?)|(?:.[0-9]+))))) (?\b\w+\b) (?\b\w+\b) (?(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])...)(?![0-9])) (?(?:(?<![0-9.±])(?>[±]?(?:(?:[0-9]+(?:.[0-9]+)?)|(?:.[0-9]+))))) (?(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])...)(?![0-9])) (?(?:(?<![0-9.±])(?>[±]?(?:(?:[0-9]+(?:.[0-9]+)?)|(?:.[0-9]+))))) (?(?>(?<!\)(?>“(?>\.|[^\”]+)+“|”“|(?>‘(?>\.|[^\’]+)+‘)|’'|(?>(?>\\.|[^\\]+)+)|``))) (?<cert-user>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>(?>\.|[^\]+)+)|))) (?<method>\b\w+\b) (?<proto>.*?) (?<host>.*?) (?<version>.*?) (?<http-status>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<byte-sent>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<byte-rec>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<cache-hit>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<time-taken>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<service-ip>(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])) (?<service-port>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<server-time>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<session-id>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|))) (?\b\w+\b) (?\b\w+\b) (?\b\w+\b) (?\b\w+\b) (?.*?) (?(?>(?<!\)(?>”(?>\.|[^\“]+)+”|“”|(?>‘(?>\.|[^\’]+)+‘)|’‘|(?>(?>\\.|[^\\]+)+)|``))) (?<referer>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>(?>\.|[^\]+)+)|))) (?<cookie>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|))) (?(?>(?<!\)(?>“(?>\.|[^\”]+)+“|”"|(?>’(?>\.|[^\‘]+)+’)|‘’|(?>(?>\\.|[^\\]+)+)|``))) (?<proxy-ip>(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])) (?<proxy-port>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?<auth-user>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>(?>\.|[^\]+)+)|))) (?<c-header1>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|))) (?(?>(?<!\)(?>“(?>\.|[^\”]+)+“|”"|(?>‘(?>\.|[^\’]+)+‘)|’'|(?>(?>\\.|[^\\]+)+)|``))) (?<c-header3>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>(?>\.|[^\`]+)+`)|``)))

You have multiple atomic groups, multiple non-capture groups, lookaheads, negative lookaheads.

The question is - how many core did you have and what is the load that you see in the JVM?

Yes, I understand, it’s complicated, and CPU intensive process.

I have 8 cores. Current load 0,5-1,5 (full, not /cores); 5-10k msg/s process speed. When I have the long one the load is between 10 and 15; 50-100 msg/s process speed.

Why anything behind the user-agent cause this slow? An IPV4 address, or just a simple DATA.

I was reading this elastic blog post on GROK efficiency recently:

Maybe it won’t help - but it was an interesting read.

Thanks! It helps a lot.
Also I don’t know what exactly cause the problem, but the anchors are helped.
I modified the GROK to start with "^lb. " and end with “$” and in this case it doesn’t cause performance problem.

I also modify the topic for the future.

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.