Extractor using CSV converter for space delimited messages?


(RIP Bob Dole) #1

Hi,
I’m trying to create an extractor for unbound DNS request logs.

Sample message:
unbound: [31543:1] info: 10.2.2.1 ahostname-com A IN

Extractor Type: regex

Regex (confirmed working): [redacted. the forum software thinks I have links in my post)

Condition (confirmed working): [redacted: the forum software thinks I have links in my post)

Store as field: dnslookupdata

Strategy: copy

Converter: CSV to fields

Field names: dns_srcip dns_req dns_rectype dns_class

Separator character: (I pressed space bar one time)

Quote character: (I tried leaving blank but it doesn’t work so left default ")

Escape character: \

The regexes work but the converter fails. Is using a “CSV to fields” a suitable way to parse messages that are delimited with spaces? My goal is to grab all fields in a single extractor. I’m new to Graylog. Thanks.


(Ben van Staveren) #2

You could but the easier way probably would be to use a Grok extractor instead of the regexp one, since you can then also split all the data into fields already (an example pattern that assumes that ‘unbound’ is the application name, and the 31543:1 is the pid and thread:

"^%{WORD:application_name}: \\[%{NUMBER:pid}:%{NUMBER:thread}\\] %{WORD:level} %{IP:dns_srcip} %{NOTSPACE:dns_req} %{WORD:dns_rectype} %{WORD:dns_class}"

Then you only have the one extractor running, and theoretically it should grab you all the fields you need. Keep in mind the above pattern is an example, so adjust that to your needs, don’t just whap it in and expect it to work :wink:


(RIP Bob Dole) #3

Regex: ^unbound:\s+[\d+:\d+]\sinfo:(.+)$

Field matches regex: ^unbound:\s+[\d+:\d+]\sinfo:\s((?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})...)(?![0-9]))


(RIP Bob Dole) #4

Thanks for your help. The Grok pattern you provided was pretty much spot on but I made a couple of changes.

^%{WORD:application_name}: [%{NUMBER:pid}:%{NUMBER:thread}] %{LOGLEVEL:level}: %{IP:dns_srcip} %{NOTSPACE:dns_req} %{WORD:dns_rectype} %{WORD:dns_class}

The fields show exactly like I expect when I test against a sample message in the web UI.

However I’m getting an error for messages coming into this input now.

2018-12-09 19:42:30,468 WARN : org.graylog2.indexer.messages.Messages - Failed to index message: index=<graylog_0> id=<7984c552-fc14-11e8-98cb-525400e32cc4> error=<{“type”:“mapper_parsing_exception”,“reason”:“failed to parse [level]”,“caused_by”:{“type”:“number_format_exception”,“reason”:“For input string: “info””}}>

It seems like the index expects level to be a numeric value. I don’t know much about how the indexer works, but is this effectively a schema change so I need to rebuild the index?


(Jan Doberstein) #5

2018-12-09 19:42:30,468 WARN : org.graylog2.indexer.messages.Messages - Failed to index message: index=<graylog_0> id=<7984c552-fc14-11e8-98cb-525400e32cc4> error=<{“type”:“mapper_parsing_exception”,“reason”:“failed to parse [level]”,“caused_by”:{“type”:“number_format_exception”,“reason”:“For input string: “info””}}>

The field level is created as a number field - but now you try to place a string in this field.

The solution can be one or multiple of the following.

  1. create a custom index mapping that forces the field to be a string - even if the first value that is ingested is a number ( http://docs.graylog.org/en/2.5/pages/configuration/elasticsearch.html#custom-index-mappings )
  2. write a processing pipeline that change the field name if the value is a string and/or number - so you have two fields level_string and/or level_num for example
// This rule will rename the level field
// if it contains a string and not a number
rule "check_level_for_number"
when
    has_field("level") AND 
    // until 3.0 is in da house we need this 
    // dirty little trick 
    is_null(to_long($message.level))
   // 3.0 contains `is_number`
then
    rename_field("level", "level_string");
end

  1. check that all senders only send the defined types in the fields.

(Ben van Staveren) #6

What @jan said, basically. There’s also a chance that the index you are saving to already has a ‘level’ field that’s defined as a number (possible, if it’s ingesting syslog). Ideally you’d store these messages in their own index set, and ensure that only these messages go in so the auto-mapping "does the right thing"™.


(system) #7

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.