GROK Extractor question

(Tom Powers) #1


Loving the new forum setup

Anyway…I have a simple syslog stream coming in from a Maltrail Sensor. I am able to GROK the pattern for the src and dst ip, but how do I tell it to extract the fields that is always after the date and | ?

See below: after this field " 2017-02-19| " is the field that would be great to break out into it’s own extracted field.

All insight is appreciated


Feb 20 06:09:36 MalTrail CEF:0|Maltrail|sensor|0.10.182|2017-02-19|tor exit node (suspicious)|0|src= spt=123 dst= dpt=123 trail= (,

(Jochen) #2

Hi Thomas,

you could give the Graylog CEF plugin a try:


(Tom Powers) #3

OK…I put it in, and I see messages hitting it…if i send a message, I see the Total change

Throughput / Metrics
1 minute average rate: 0 msg/s
Network IO: 0B 0B (total: 3.2KB 0B )
Empty messages discarded: 0

Yet I hit the Show Massages and it is blank

I’m guessing it’s a Timezone issue, but anything I put in there doesn’t seem to matter.

(Jochen) #4

Given that there are no error messages in the logs of your Graylog node, you could use an absolute time range and set the end point to some time in the future.

This should show messages with an invalid timestamp or invalid timezone information.

(Tom Powers) #5

Hmmm…tried that too…I event went 3 days into the future…no dice

While I try to figure this out, I would be interested in how to parse out the same on the regular syslog UDP input.

I assume there’s a way to tell it to grab that 6th area (see below again)

I will continue to explore the CEF, but any insight into how to grab that field would be great and something I could use elsewhere



Feb 20 10:07:36 MalTrail CEF:0|Maltrail|sensor|0.10.182|2017-02-19|sinkhole conficker (malware)|0|src= spt=- dst= dpt=- trail= ref=(static)

(Philipp Ruland) #6

Hey @ThomasPowers,

you could use following GROK extractor first:

After this add a Copy-Input extractor on field value3 (or whatever you rename it to), make it copy/cut to itself and add a Key=Value converter to the extractor.

Et voilà, you got your fields :slight_smile:

Greets - Phil

I have no idea if the names I gave the fields make sense, so change them if needed :smiley:

GROK-Pattern for Header (Is it actually a header? Idk :smiley: )

(Lennart Koopmann) #7

The reason why the CEF input does not work in your case is, that we are expecting a CEF message to be sent via syslog but your example message is not in syslog format. For example, the PRI field is missing.

I just reproduced this:

org.graylog.plugins.cef.parser.CEFParser$ParserException: This message was not recognized as CEF and could not be parsed.

	at org.graylog.plugins.cef.parser.CEFParser.parse(
	at org.graylog.plugins.cef.parser.CEFParserTest.testCustomParse1(
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
	at java.lang.reflect.Method.invoke(
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(
	at org.junit.runners.ParentRunner.runLeaf(
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(
	at org.junit.runners.ParentRunner$
	at org.junit.runners.ParentRunner$1.schedule(
	at org.junit.runners.ParentRunner.runChildren(
	at org.junit.runners.ParentRunner.access$000(
	at org.junit.runners.ParentRunner$2.evaluate(
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(
	at com.intellij.rt.execution.junit.JUnitStarter.main(
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
	at java.lang.reflect.Method.invoke(
	at com.intellij.rt.execution.application.AppMain.main(

I’m thinking about shipping a parse_cef() pipeline rule to allow more flexible parsing. Would you be willing to give it a try? Should be much easier than using GROK.

(Tom Powers) #8

Hey…this worked Great! THe splitting of the fields by the | was throwing me off…

Now…On to trying to get the dst= address to hit on the geolocation stuff



(Tom Powers) #9

I would try it…sure…but I have no idea what this rule would look like and where it would go


(Philipp Ruland) #10

You need to use | because | (pipe) is used as OR-operator when using GROK, so you need to escape it with a backslash :slight_smile:

I can give a little hint there as well :smiley:

Make sure that the GeoIP Resolver is last in the processing chain :smiley:
When you have set it up (added the IP-database and enabled it on the configuration page) it will automatically start to geotag every field having an IP-address :slight_smile:

(Philipp Ruland) #11

Look here. :slight_smile:

(Tom Powers) #12

The GeoResolver wasn’t working until I moved it to the bottom of the chain…thanks for the heads up on that.


(Philipp Ruland) #13

No problem :slight_smile:
The GeoIP Resolver needs to be last because it can only check already existing fields for an IP-address, but if it is first in the processing chain there most certainly won’t be a field containing just an IP-address :smiley:

Greets - Phil

(Tom Powers) #14

OK Phil…since were on a role…what about when the beginning and end of a field isn’t as obvious as the | we have before

Lets say we want to grab info from here:

Primarily looking for the Alert Received Field, the ET rule in {}, the Source and Destination IPs

Still trying to wrap my head around how this thing defines starts and stops for the fields



onion sguil_alert: 21:13:07 pid(24703) Alert Received: 0 3 misc-activity onion-eth1 {2017-02-20 21:13:07} 3 150963 {ET SCAN Behavioral Unusual Port 139 traffic, Potential Scan or Infection} 6 49478 139 1 2001579 14 320 320

(Tom Powers) #15

Actually…I think I got it:

Received: %{DATA:test}{%{DATA:date}} %{DATA:value1}{%{DATA:ETRule}} %{IP:DetSource} %{IP:DetDST}

Then I just UNWANTED what I don’t need

I’m slowly getting there



(Philipp Ruland) #16

The nice thing about GROK is that as long you have some chars or values that stay constant, you will be able to get that structure.

I guess you can look at the GROK-Pattern and compare it to the String you gave me. Basically you just need to think about using placeholders in the locations where the data that you want is located. But keep in mind that you should be as precise as possible, so don’t use patterns like DATA or GREEDYDATA if you can use more specific ones like IP, DATE, BASE10NUM or whatever :smiley:

If you dont need the {2017-02-20 21:13:07} 3 150963 part just change the GROK-pattern of text1 to DATA :slight_smile:

Well, I just saw that you did it yourself :smiley:
But you made one mistake that makes Graylog refuse the GROK-pattern. You didn’t escape the { } brackets. These are symbols used by GROK, so they need to be escaped when they are actually inside the string. Else Graylog thinks these brackets are part of a pattern it should parse. Your pattern would be this ````Received: %{DATA:test}{%{TIMESTAMP_ISO8601:date}} %{DATA:value1}{%{DATA:ETRule}} %{IP:DetSource} %{IP:DetDST}```

Notice the TIMESTAMP_ISO8601 pattern instead of DATA on your date field regarding being specific :slight_smile:

And you don’t need to make patterns unwanted if you don’t want them (except they are needed to parse the text-structure correctly) . Just make sure you check Named captures only. Graylog will automatically ignore the rest, see:

Greets - Phil