Pipeline Grok Patterns


(Egor) #1

Hello

Can you please help me. I want parsed log file secure on linux like filebeat in module.
For Example filebeat has this rule for file secure:

"%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname} sshd(?:\\[%{POSINT:system.auth.pid}\\])?: %{DATA:system.auth.ssh.event} %{DATA:system.auth.ssh.method} for (invalid user )?%{DATA:system.auth.user} from %{IPORHOST:system.auth.ssh.ip} port %{NUMBER:system.auth.ssh.port} ssh2(: %{GREEDYDATA:system.auth.ssh.signature})?",
"%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname} sshd(?:\\[%{POSINT:system.auth.pid}\\])?: %{DATA:system.auth.ssh.event} user %{DATA:system.auth.user} from %{IPORHOST:system.auth.ssh.ip}",
"%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname} sshd(?:\\[%{POSINT:system.auth.pid}\\])?: Did not receive identification string from %{IPORHOST:system.auth.ssh.dropped_ip}",
"%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname} sudo(?:\\[%{POSINT:system.auth.pid}\\])?: \\s*%{DATA:system.auth.user} :( %{DATA:system.auth.sudo.error} ;)? TTY=%{DATA:system.auth.sudo.tty} ; PWD=%{DATA:system.auth.sudo.pwd} ; USER=%{DATA:system.auth.sudo.user} ; COMMAND=%{GREEDYDATA:system.auth.sudo.command}",
"%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname} groupadd(?:\\[%{POSINT:system.auth.pid}\\])?: new group: name=%{DATA:system.auth.groupadd.name}, GID=%{NUMBER:system.auth.groupadd.gid}",
"%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname} useradd(?:\\[%{POSINT:system.auth.pid}\\])?: new user: name=%{DATA:system.auth.useradd.name}, UID=%{NUMBER:system.auth.useradd.uid}, GID=%{NUMBER:system.auth.useradd.gid}, home=%{DATA:system.auth.useradd.home}, shell=%{DATA:system.auth.useradd.shell}$",

“%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname}? %{DATA:system.auth.program}(?:\[%{POSINT:system.auth.pid}\])?: %{GREEDYMULTILINE:system.auth.message}”

And I wanted to specify these rules in the grok pipeline, but as I understand it, the grok pipeline does not know how to use several rules.
I tried to specify several patterns, but then only the last one was worked out:

rule “Secure_Message”
when
true
then
let mess = to_string($message.message);
let parsed = grok(pattern: “%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname} sshd(?:\[%{POSINT:system.auth.pid}\])?: %{DATA:system.auth.ssh.event} %{DATA:system.auth.ssh.method} for (invalid user )?%{DATA:system.auth.user} from %{IPORHOST:system.auth.ssh.ip} port %{NUMBER:system.auth.ssh.port} ssh2(: %{GREEDYDATA:system.auth.ssh.signature})?”,pattern: “%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname}? %{DATA:system.auth.program}(?:\[%{POSINT:system.auth.pid}\])?: %{GREEDYDATA:system.auth.message}”,value: mess,only_named_captures: true);
set_fields(parsed);
end


(Ben van Staveren) #2

You want to replace all the dots in the grok fields with underscores, to start with (e.g. system_auth_hostname instead of system.auth.hostname), then it should actually work, I’m parsing our auth.log entries with the same set of rules (with dots replaced with underscores) without any issues.


(Egor) #3

Perhaps you misunderstood me.
I want to use several grok rules, for example this can be done in logstash or filebeat. But as I noticed this does not work in the grok pipeline.
For example, I gave the rules from filebeat for secure Linux logs.
But I do not understand how to specify the same list of rules in the grok pipeline.


(Jan Doberstein) #4

you can have as many grok extract grok rules as you want in one processing rule.

Maybe you can rephrase what you want to-do exactly, because it would be not very effective to run 6 very complex grok patterns on all incoming messages.

But in addition what @benvanstaveren wrote you extract information and place that in fieldnames with dots - what is currently not supported in Graylog. So you need to replace the field names with unterscores.


(Egor) #5

Hello.
I will try again.
I created Input for a filebeat through which I receive different logs, for example: nginx, syslog.
I think it would be inconvenient to use the usual extract, since nginx and syslog logs are different.
I distributed by streaming data logs, but faced with the problem of the use of multiple filters grok.
For example, the secure logs have different lines and for parsing the filebeat contains several such rules:

“%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname} sshd(?:\[%{POSINT:system.auth.pid}\])?: %{DATA:system.auth.ssh.event} %{DATA:system.auth.ssh.method} for (invalid user )?%{DATA:system.auth.user} from %{IPORHOST:system.auth.ssh.ip} port %{NUMBER:system.auth.ssh.port} ssh2(: %{GREEDYDATA:system.auth.ssh.signature})?”,
“%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname} sshd(?:\[%{POSINT:system.auth.pid}\])?: %{DATA:system.auth.ssh.event} user %{DATA:system.auth.user} from %{IPORHOST:system.auth.ssh.ip}”,
“%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname} sshd(?:\[%{POSINT:system.auth.pid}\])?: Did not receive identification string from %{IPORHOST:system.auth.ssh.dropped_ip}”,
“%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname} sudo(?:\[%{POSINT:system.auth.pid}\])?: \s*%{DATA:system.auth.user} :frowning: %{DATA:system.auth.sudo.error} ;)? TTY=%{DATA:system.auth.sudo.tty} ; PWD=%{DATA:system.auth.sudo.pwd} ; USER=%{DATA:system.auth.sudo.user} ; COMMAND=%{GREEDYDATA:system.auth.sudo.command}”,
“%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname} groupadd(?:\[%{POSINT:system.auth.pid}\])?: new group: name=%{DATA:system.auth.groupadd.name}, GID=%{NUMBER:system.auth.groupadd.gid}”,
“%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname} useradd(?:\[%{POSINT:system.auth.pid}\])?: new user: name=%{DATA:system.auth.useradd.name}, UID=%{NUMBER:system.auth.useradd.uid}, GID=%{NUMBER:system.auth.useradd.gid}, home=%{DATA:system.auth.useradd.home}, shell=%{DATA:system.auth.useradd.shell}$”,
“%{SYSLOGTIMESTAMP:system.auth.timestamp} %{SYSLOGHOST:system.auth.hostname}? %{DATA:system.auth.program}(?:\[%{POSINT:system.auth.pid}\])?: %{GREEDYMULTILINE:system.auth.message}”

And I would like to use these rules in the grok pipeline.
For example, to create this:
let parsed = grok(pattern: [“first pattern”,“second pattern”],value: mess,only_named_captures: true);

And I understand the problem with points.
But I have another question about the use of multi grok

Thank you for understanding.


(Egor) #6

I noticed when I use this view:
let parsed = grok(pattern: “first pattern”, pattern: “second pattern”,value: mess,only_named_captures: true);
Only the last pattern works.


(Jan Doberstein) #7

you have multiple options to solve this - and you are in a “you hold it wrong” condition currently.

The first possible solution:

Create in System > Grok-Pattern your single lined GROK patterns. After all of them are created, make one additional Patter that combines them all together. Means the content will be: %{FIRST_PATTERN}|%{SECOND_PATTERN}|%{THIRD_PATTERN}. This (the pipe) will make them all connected with OR.

But be aware that this will might create a monster regex all messages need to pass - read, that will significantly cut down your ingest rate.

The second possible solution:

Create one rule with multiple GROK extractions:

rule “Secure_Message”
when
true
then
let mess = to_string($message.message);
let parsed = grok(pattern: “%{FIRST_PATTERN}”,value: mess,only_named_captures: true);
let parsed = grok(pattern: “%{SECOND_PATTERN}”,value: mess,only_named_captures: true);
let parsed = grok(pattern: “%{THIRD_PATTERN}”,value: mess,only_named_captures: true);
set_fields(parsed);
end

or similar …

The third possible solution (preferred):

create one rule for each GROK pattern and only run the rule only when needed to not waste resources.

Jan


(Egor) #8

Hello.

Thank you very much for your help.
I have question:
In the second solution, the variable “parsed” will not overwrite itself?
Since the parsed variable will be called three times and each condition will assign a new value to the variable.

And in the Third solution:
you suggest using the “when” condition and one grok pattern, right?


(Jan Doberstein) #9

The parsedvariable holds an array of all named fields - as long as these are not the same name it will just fill with all possible values.

The third and best option is to make a good when condition to run the Grok pattern on the message that it needs to run. HINT: That does not mean - use the GROK in the WHEN condition and than a second time in the THEN condition. Because you would then run the Grok pattern twice. Find something that let you identify when to run what pattern.


(Ben van Staveren) #10

To build on what @jan described, in our setup we start with a Pipeline rule that groks the entire message and splits out things into temporary fields. The next stage has rules in it that test for the existance of a particular field (generated - or not - by the first grok action), and then groks the individual field passed, and removes the temporary field from the message.

It works for us but your situation may be different enough that it won’t be feasible to do it like that.


(Egor) #11

Hello.

Thank you very much for your help.


(system) #12

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.