Grok extractor pattern help

Hi!

I’m new to graylog and I’m having trouble trying to extract data from my log lines via a Grok extractor

My log lines are semicolon separated lists like this: timestamp;uid;ip;useragent;url …
So I created a grok pattern that looks like this:

%{TIME};%{DATA:request_id};%{DATA:ip};{DATA:useragent};

My problem is with the user agent. In a normal case, the user agent is just a string retrievable with %{DATA:useragent}. But in some cases, the user agent contains a semicolon, so it is enclosed in quotes in the log line.

How can I get this field in all cases?

I tried to use the ${QUOTEDSTRING} grok pattern which does not work when there is no quotes. So I tried to create a new pattern (%{QUOTEDSTRING}|.*), but it also catches the semicolons after when their is not quotes (Mozilla/5.0;13;14;22).

I also tried with (%{QUOTEDSTRING}|.*?), without success.

Thanks for your help!

Hello && Welcome @mobarzik

Need to ask a coupel questions. By chance can you give an example of the whole message? Have you tried pieplines and/or Regex instread of Grok?

Hello @gsmith

Here are two examples of lines (anonymized of course)

With quotes because of the semicollon in useragent :
21:50:32,522;434FBF4-9B38-DEADBEEF-01BB-6418E98D-13E52BB8-1057E;1679353197584;;8.8.8.8;;;802;;;/myuripath/search?date=1292523600000;;;"Mozilla/5.0 (Linux; Android 12; SM-G970U1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Mobile Safari/537.36";0;0;0;0;;0;0;;;

Without quotes

21:50:32,522;434FBF4-9B38-DEADBEEF-01BB-6418E98D-13E52BB8-1057E;1679353197584;;8.8.8.8;;;802;;;/myuripath/search?date=1292523600000;;;Go-http-client/2.0;0;0;0;0;;0;0;;;

As I explained, there are 21 fields separated by semicolons, some fields being empty.

I didn’t try with a pipeline but I imagine that the split function will give me the same problems in the first case.

Hey @mobarzik

I was just going to say pipeline split_function.
What I have in my personal doc’s is this, but I have not tested it yet on you logs.

Example:

rule “your_rule_name”
when
    true
then
set_fields (
  fields: grok (
   pattern:“%{TIMESTAMP_ISO8601:datetime} %{NOTSPACE:s-ip} %{WORD:cs-method} % 
  {URIPATH:cs-uri-stem} %{GREEDYDATA:cs-uri-query} %{INT:s-port} %{NOTSPACE:cs-username} % 
  {NOTSPACE:c-ip} %{NOTSPACE:cs-useragent} %{NOTSPACE:referer} %{NUMBER:sc-status} % 
  {NUMBER:sc-substatus} %{NUMBER:sc-win32-status} %{NUMBER:time-taken}”,
   value: to_string($message.sepm_fields),
   only_named_captures: true
   )
 );
end

Looking at “Without quotes” It looks like you pulled some data from message field and that is whats left. Are you trying to separate the message into different fields or modify the message itself?
By chance, what type odf input are you using? and what types of inputs have you used?

I usually work with rules and not with extractors. I do not know how log Graylog will support extractors.

I can start with a GROK debugger like this one:

Sometimes, Open AI can help to build the basic GROK pattern. If it is something well known, the AI will often tell me what fields are optional.

The next step would be looking into the Graylog Schema. You can use your field names instead of the schema fields, but if you have the choice you might can import dashboards and they might work out of the box.

Not all GROK patterns will run out of the box because the escaping is different. For let us say a doublequote the grokdebugger needs a single quote but in a GROK pattern in Graylog it might need four escapes.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.