Indexer failures perhaps due to extractor weirdness

Hi

On Graylog 3.1, I am processing logs of a Palo Alto firewall cluster. I am getting quite a number of indexer failures for some of these messages, all of them reading something like:
"{“type”:“mapper_parsing_exception”,“reason”:“failed to parse field [pa_dev_group_hierarchy_level_1] of type [integer] in document with id ‘fed4f4e0-d5ef-11e9-a955-005056842309’”,“caused_by”:{“type”:“number_format_exception”,“reason”:“For input string: “0x0"”}}”

the field in question (pa_dev_group_hierarchy_level_1) is only present in one of my custom-built extractors, so I checked it out. It reads as follows:
“grok_pattern: %{SYSLOGTIMESTAMP:pa_datetime} %{DATA:pa_future_use1},%{DATA:pa_receive_time},%{DATA:pa_serial},%{DATA:pa_type},%{DATA:pa_content_threat_type},%{DATA:pa_future_use2},%{DATA:pa_gen_time},(?:%{DATA:pa_virtual_system})?,%{DATA:pa_event_id},(?:%{DATA:pa_object})?,%{DATA:pa_future_use3},%{DATA:pa_future_use4},%{DATA:pa_module},%{DATA:pa_severity},”%{DATA:pa_description}",%{DATA:pa_seqnum},%{DATA:pa_action_flags},%{DATA:pa_dev_group_hierarchy_level_1},%{DATA:pa_dev_group_hierarchy_level_2},%{DATA:pa_dev_group_hierarchy_level_3},%{DATA:pa_dev_group_hierarchy_level_4},%{DATA:pa_virtual_system_name},%{DATA:pa_device_name}"

The problem could be in the pa_description field. To illustrate, here are two example messages (with pseudonymised hostname and IP addresses):

Sep 13 08:31:54 hostname 1,2019/09/13 08:31:54,001801001238,SYSTEM,userid,0,2019/09/13 08:31:54,user-group-count,0,0,general,high,“User Group count of 1951 exceeds threshold of 1000”,4058094,0x0,0,0,0,0,hostname

Sep 13 08:31:53 hostname 1,2019/09/13 08:31:53,001801001238,SYSTEM,userid,0,2019/09/13 08:31:53,connect-ldap-sever,192.168.74.83,0,0,general,informational,“ldap cfg ldap-blablabla connected to server 192.168.74.45:389, initiated by: 192.168.74.98”,4058093,0x0,0,0,0,0,hostname

Notice the comma (",") in the description of the second message ("…server 192.168.74.45:389**,** initiated …"). When I test the extractor with this exact message, it works fine, and I get (among others):

pa_description
ldap cfg ldap-blablabla connected to server 192.168.74.45:389, initiated by: 192.168.74.98
pa_seqnum
4058093
pa_action_flags
0x0
pa_dev_group_hierarchy_level_1
0

However, in the search window, for this exact same message, the readings are:

pa_description
“ldap cfg ldap-blablabla connected to server 192.168.74.45:389
pa_seqnum
initiated by: 192.168.74.98”
pa_action_flags
4058093
pa_dev_group_hierarchy_level_1
0x0

Why is the extractor obviously malfunctioning when the test turns out fine? And how can I fix it?

Cheers,
Tobias

"{“type”:“mapper_parsing_exception”,“reason”:“failed to parse field [pa_dev_group_hierarchy_level_1] of type [integer] in document with id ‘fed4f4e0-d5ef-11e9-a955-005056842309’”,“caused_by”:{“type”:“number_format_exception”,“reason”:“For input string: “0x0"”}}”

you did not ensure that you always have the same data type extracted - like long or string what can end in lost messages because - like in this case - you have a data mapping that does not fit.

Create a custom elasticsearch mapping and/or force the extractor to get only the wanted data type.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.