Hi,
I’m trying to create an extractor for unbound DNS request logs.
Sample message:
unbound: [31543:1] info: 10.2.2.1 ahostname-com A IN
Extractor Type: regex
Regex (confirmed working): [redacted. the forum software thinks I have links in my post)
Condition (confirmed working): [redacted: the forum software thinks I have links in my post)
Store as field: dnslookupdata
Strategy: copy
Converter: CSV to fields
Field names: dns_srcip dns_req dns_rectype dns_class
Separator character: (I pressed space bar one time)
Quote character: (I tried leaving blank but it doesn’t work so left default ")
Escape character: \
The regexes work but the converter fails. Is using a “CSV to fields” a suitable way to parse messages that are delimited with spaces? My goal is to grab all fields in a single extractor. I’m new to Graylog. Thanks.
You could but the easier way probably would be to use a Grok extractor instead of the regexp one, since you can then also split all the data into fields already (an example pattern that assumes that ‘unbound’ is the application name, and the 31543:1 is the pid and thread:
Then you only have the one extractor running, and theoretically it should grab you all the fields you need. Keep in mind the above pattern is an example, so adjust that to your needs, don’t just whap it in and expect it to work
The fields show exactly like I expect when I test against a sample message in the web UI.
However I’m getting an error for messages coming into this input now.
2018-12-09 19:42:30,468 WARN : org.graylog2.indexer.messages.Messages - Failed to index message: index=<graylog_0> id=<7984c552-fc14-11e8-98cb-525400e32cc4> error=<{“type”:“mapper_parsing_exception”,“reason”:“failed to parse [level]”,“caused_by”:{“type”:“number_format_exception”,“reason”:“For input string: “info””}}>
It seems like the index expects level to be a numeric value. I don’t know much about how the indexer works, but is this effectively a schema change so I need to rebuild the index?
write a processing pipeline that change the field name if the value is a string and/or number - so you have two fields level_string and/or level_num for example
// This rule will rename the level field
// if it contains a string and not a number
rule "check_level_for_number"
when
has_field("level") AND
// until 3.0 is in da house we need this
// dirty little trick
is_null(to_long($message.level))
// 3.0 contains `is_number`
then
rename_field("level", "level_string");
end
check that all senders only send the defined types in the fields.
What @jan said, basically. There’s also a chance that the index you are saving to already has a ‘level’ field that’s defined as a number (possible, if it’s ingesting syslog). Ideally you’d store these messages in their own index set, and ensure that only these messages go in so the auto-mapping "does the right thing"™.