Extractor missing fields. How can I troubleshoot?


(JJ Martinez) #1

Hi all, I’m trying to parse some logs. I first wrote a BASH script to format the logs in JSON and export to graylog. Graylog is receiving the full messages. I put a JSON extractor and the preview shows all the fields but when I search for message, a few fields are missing.

OK. Maybe my json format is wrong. So I decide to rewrite the script to space separate the data. I changed the input type to raw data. I used the split and index. The preview again shows the messages and data extracting correctly but the messages again are missing half the fields.

What gives? What am I doing wrong? How do I find out what is happening?


(Jochen) #2

Please provide more information, such as some example messages, your extractor configuration, and the scripts you’ve been using.


(JJ Martinez) #3

Thanks for the reply.

Here is the script I am using. It’s rough but I’m not a programmer. The first line is an example message.

#logline="2017-12-12 09:08:22 fail2ban.filter [1311] INFO [sshd] Found 221.194.47.233"
tail -F -n0 /var/log/fail2ban.log | \
while read logline; do
        var1=$(echo $logline | cut -f1 -d ' ')
        echo $var1

        var2=$(echo $logline | cut -f2 -d ' ')
        var2=$(echo $var2 | cut -f1 -d ',')
        echo $var2

        var3=$(echo $logline | cut -f3 -d ' ')
        echo $var3

        var4=$(echo $logline | cut -f4 -d ' ')
        var4=$(echo $var4 | cut -f1 -d ':')
        echo $var4

        var5=$(echo $logline | cut -f5 -d ' ')
        echo $var5
        var6=$(echo $logline | cut -f6 -d ' ')
        echo $var6

        var7=$(echo $logline | cut -f7 -d ' ')
        echo $var7

        var8=$(echo $logline | cut -f8 -d ' ')
        echo $var8


        echo "Message to syslog: "$var1" "$var2" "$var3" "$var4" "$var5" "$var6" "$var7" "$var8""
        echo "$var1" "$var2" "$var3" "$var4" "$var5" "$var6" "$var7" "$var8" | nc 10.20.60.101 13000
done

While I’m running the script, the output to the screen is correct. For instance:
2017-12-12
09:08:22
fail2ban.filter
[1311]
INFO
[sshd]
Found
221.194.47.233
Message to syslog: 2017-12-12 09:08:22 fail2ban.filter [1311] INFO [sshd] Found 221.194.47.233

Here is a copy of the extractors:


(JJ Martinez) #4

Actually, I was just noticing the messages in Graylog were missing most of the data. I deleted the extractors and now I can see the data in the message.


(Jochen) #5

Why aren’t you using the Fail2ban Grok patterns from the Graylog Marketplace instead of using your homegrown script?

In general, I’d recommend splitting these kind of logs in Graylog using a Grok extractor or the Processing pipelines.


(JJ Martinez) #6

I tried it but couldn’t get it to extract properly. That’s why I was working on my own solution. The problem is the variance in messages.


(JJ Martinez) #7

I tried adding the Grok extractor again. The preview works but I don’t see any messages in my search since applying. If I remove the extractor, I start seeing messages again. It seems like it’s failing to extract but I don’t know how to figure out why.


(Jochen) #8

Are you sure it’s not just a timezone problem?

Try searching in a time range some hours in the future with an absolute time range.


(JJ Martinez) #9

I don’t believe so. I’m still relatively new to graylog and this is a new installation. I say I don’t believe so because when I go to inputs and click “show all received messages” i don’t see them in there. I also searched and saw 190 messages. Then I created some events but I still see only 190 messages.


(Jochen) #10

Please try out what I’ve written in my last post.


(JJ Martinez) #11

I used “2017-12-12 00:00:00 to 2017-12-17 00:00:00”

I don’t see any messages after I applied the GROK pattern. If I go to the extractor and test, it shows successful, though.


(JJ Martinez) #12


(JJ Martinez) #13

I found an log. I see this in there. It comes out when I expect this message. Any ideas? Looks like it’s having problems with DateTime

2017-12-12T10:46:34.246-06:00 ERROR [BlockingBatchedESOutput] Unable to flush message buffer
java.lang.ClassCastException: Cannot cast java.lang.String to org.joda.time.DateTime
        at java.lang.Class.cast(Class.java:3369) ~[?:1.8.0_151]
        at org.graylog2.plugin.Message.getFieldAs(Message.java:384) ~[graylog.jar:?]
        at org.graylog2.plugin.Message.getTimestamp(Message.java:189) ~[graylog.jar:?]
        at org.graylog2.indexer.messages.Messages.propagateFailure(Messages.java:181) ~[graylog.jar:?]
        at org.graylog2.indexer.messages.Messages.bulkIndex(Messages.java:145) ~[graylog.jar:?]
        at org.graylog2.outputs.ElasticSearchOutput.writeMessageEntries(ElasticSearchOutput.java:111) ~[graylog.jar:?]
        at org.graylog2.outputs.BlockingBatchedESOutput.flush(BlockingBatchedESOutput.java:129) [graylog.jar:?]
        at org.graylog2.outputs.BlockingBatchedESOutput.forceFlushIfTimedout(BlockingBatchedESOutput.java:154) [graylog.jar:?]
        at org.graylog2.periodical.BatchedElasticSearchOutputFlushThread.doRun(BatchedElasticSearchOutputFlushThread.java:82) [graylog.jar:?]
        at org.graylog2.plugin.periodical.Periodical.run(Periodical.java:77) [graylog.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_151]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_151]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_151]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]

(Jochen) #14

The “timestamp” field is expected to be a proper date/time object and not a string.

You can parse a string and convert it into a date/time object by using a Date converter in your extractors or by using the parse_date() function in a processing pipeline rule.


(system) #15

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.