Extractor missing fields. How can I troubleshoot?

Hi all, I’m trying to parse some logs. I first wrote a BASH script to format the logs in JSON and export to graylog. Graylog is receiving the full messages. I put a JSON extractor and the preview shows all the fields but when I search for message, a few fields are missing.

OK. Maybe my json format is wrong. So I decide to rewrite the script to space separate the data. I changed the input type to raw data. I used the split and index. The preview again shows the messages and data extracting correctly but the messages again are missing half the fields.

What gives? What am I doing wrong? How do I find out what is happening?

Please provide more information, such as some example messages, your extractor configuration, and the scripts you’ve been using.

Thanks for the reply.

Here is the script I am using. It’s rough but I’m not a programmer. The first line is an example message.

#logline="2017-12-12 09:08:22 fail2ban.filter [1311] INFO [sshd] Found"
tail -F -n0 /var/log/fail2ban.log | \
while read logline; do
        var1=$(echo $logline | cut -f1 -d ' ')
        echo $var1

        var2=$(echo $logline | cut -f2 -d ' ')
        var2=$(echo $var2 | cut -f1 -d ',')
        echo $var2

        var3=$(echo $logline | cut -f3 -d ' ')
        echo $var3

        var4=$(echo $logline | cut -f4 -d ' ')
        var4=$(echo $var4 | cut -f1 -d ':')
        echo $var4

        var5=$(echo $logline | cut -f5 -d ' ')
        echo $var5
        var6=$(echo $logline | cut -f6 -d ' ')
        echo $var6

        var7=$(echo $logline | cut -f7 -d ' ')
        echo $var7

        var8=$(echo $logline | cut -f8 -d ' ')
        echo $var8

        echo "Message to syslog: "$var1" "$var2" "$var3" "$var4" "$var5" "$var6" "$var7" "$var8""
        echo "$var1" "$var2" "$var3" "$var4" "$var5" "$var6" "$var7" "$var8" | nc 13000

While I’m running the script, the output to the screen is correct. For instance:
Message to syslog: 2017-12-12 09:08:22 fail2ban.filter [1311] INFO [sshd] Found

Here is a copy of the extractors:

Actually, I was just noticing the messages in Graylog were missing most of the data. I deleted the extractors and now I can see the data in the message.

Why aren’t you using the Fail2ban Grok patterns from the Graylog Marketplace instead of using your homegrown script?

In general, I’d recommend splitting these kind of logs in Graylog using a Grok extractor or the Processing pipelines.

I tried it but couldn’t get it to extract properly. That’s why I was working on my own solution. The problem is the variance in messages.

I tried adding the Grok extractor again. The preview works but I don’t see any messages in my search since applying. If I remove the extractor, I start seeing messages again. It seems like it’s failing to extract but I don’t know how to figure out why.

Are you sure it’s not just a timezone problem?

Try searching in a time range some hours in the future with an absolute time range.

I don’t believe so. I’m still relatively new to graylog and this is a new installation. I say I don’t believe so because when I go to inputs and click “show all received messages” i don’t see them in there. I also searched and saw 190 messages. Then I created some events but I still see only 190 messages.

Please try out what I’ve written in my last post.

I used “2017-12-12 00:00:00 to 2017-12-17 00:00:00”

I don’t see any messages after I applied the GROK pattern. If I go to the extractor and test, it shows successful, though.

I found an log. I see this in there. It comes out when I expect this message. Any ideas? Looks like it’s having problems with DateTime

2017-12-12T10:46:34.246-06:00 ERROR [BlockingBatchedESOutput] Unable to flush message buffer
java.lang.ClassCastException: Cannot cast java.lang.String to org.joda.time.DateTime
        at java.lang.Class.cast(Class.java:3369) ~[?:1.8.0_151]
        at org.graylog2.plugin.Message.getFieldAs(Message.java:384) ~[graylog.jar:?]
        at org.graylog2.plugin.Message.getTimestamp(Message.java:189) ~[graylog.jar:?]
        at org.graylog2.indexer.messages.Messages.propagateFailure(Messages.java:181) ~[graylog.jar:?]
        at org.graylog2.indexer.messages.Messages.bulkIndex(Messages.java:145) ~[graylog.jar:?]
        at org.graylog2.outputs.ElasticSearchOutput.writeMessageEntries(ElasticSearchOutput.java:111) ~[graylog.jar:?]
        at org.graylog2.outputs.BlockingBatchedESOutput.flush(BlockingBatchedESOutput.java:129) [graylog.jar:?]
        at org.graylog2.outputs.BlockingBatchedESOutput.forceFlushIfTimedout(BlockingBatchedESOutput.java:154) [graylog.jar:?]
        at org.graylog2.periodical.BatchedElasticSearchOutputFlushThread.doRun(BatchedElasticSearchOutputFlushThread.java:82) [graylog.jar:?]
        at org.graylog2.plugin.periodical.Periodical.run(Periodical.java:77) [graylog.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_151]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_151]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_151]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]

The “timestamp” field is expected to be a proper date/time object and not a string.

You can parse a string and convert it into a date/time object by using a Date converter in your extractors or by using the parse_date() function in a processing pipeline rule.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.