Spliting a Syslog Output into fields


(Matt Dobson) #1

Hi All,

Forgive me if this has already been asked, I would greatly appreciate someone either taking the time to go through and explain the resolution, or pointing me to (explained like a 5yr old) steps.

I am receiving syslogs from my firewall which look something like:

"
facility
local0

level
5

message
id=SonicWallApollo sn=000000000000 time=“2019-01-11 16:59:49” fw=1.1.1.1 pri=5 c=0 m=760 msg=“TCP handshake violation detected; TCP connection dropped” n=5095186 src=8.8.8.8:52350:X7 dst=8.8.4.4:9100:X1 dstMac=aa:aa:aa:aa:aa:aa proto=tcp/9100 note=“Handshake Timeout” fw_action=“drop”

source
id=SonicWallApollo

timestamp
2019-01-11T17:10:08.068Z
"

As you can see, it only creates 5 fields automatically - “facility” , “level” , “message” , “source” , “timestamp”

I would like to split the “message” field into:
id=
sn=
time=
fw=
pri=
c=
msg=

etc etc etc for all the fields.

I understand I need to use extractors to do this, but cant for the life of me work out how to use REGEX / GROK etc (whatever I should be using) to do this.
If I just use a whitespace character, fields like “msg” and “time” don’t get split properly, because they contatin a space…
So I thought I’d ask the wonderful community!

Can anyone explain how to go about splitting this into multiple sections?

Thanks in advance,

Matt


#2

Of course we can do it. But I think better if you learn and understand it.
Find a good regex summary. Read it. Read again. Find an online regexp tester, and try it.
You Will be better if you take Time, and learn.

//I’m not sure you can copy multiple fields with regex extractor. Grok can do it.


(Matt Dobson) #3

Thanks macko003 -

I agree 100%, if this was something I needed to do commonly, or even if my job required any knowledge of regex/grok beyond this specific task, then absolutely I would learn from scratch - it’s quite interesting.
Unfortunately, I’m spread thin, and simply do not have the time to devote to researching this much more than i already have at the moment.
I have one firewall which spits this out - I can currently search using "message:searchterm " but the proceeding wildcard makes it a resource heavy operation.

I’ve managed to get to: (\w+)=(.+?)(?= \w+=|$)

Which gives:

But then when I attempt to use this in Graylog in “replace with regular expression” extractor on field “message”, using:

replacement of $1 - it gives each of the first groups from the regex ( id sn time fw pri c m msg n src dst srcMac dstMac proto fw_action)

replacement of $2 - it gives each of the second groups (SonicWallApollo 000000000000 “2019-01-11 18:01:59” 1.1.1.1 4 32 866 “Possible SYN Flood on IF X0 - src: 8.8.8.8:6172 dst: 8.8.4.4:6746 - rate: 375/sec continues” etc etc)

replacement of $1=$2 just gives me the full message as normal.

I feel like I’m getting close but I’m obviously missing something about how to split each of these out into separate fields - any tips?

Thanks


#4

Good!
Next step as I mentioned use grok extractor.
Here is the documentation.
It is perfectly fit to your problem.
http://docs.graylog.org/en/2.5/pages/extractors.html#using-grok-patterns-to-extract-data
or with a moment of search you also can get this link
https://community.graylog.org/t/multiple-fields-extractor-regex/3697


(Matt Dobson) #5

Hi,

Thanks for your help macko003.

I actually ended up using separate extractors and simply going:

sn=(.+?)(?= \w+=|$)

msg=(.+?)(?= \w+=|$)
etc etc.

Not sure if this is a terrible way to do it, but it gives me the information I need.

Thanks!