Extractor cut-ode is not working


(Dennis Ploeger) #1

Hello!

I have the following problem:

I receive messages with an IPv6 address in a field, I extract to “clientip”. This field is interpreted as an IP-type by our ES 2.3, which does not support IPv6.

So I tried to create an extractor that copies the field contents to a new field called “clientip_v6” and set the extractor mode to “cut”, which I thought should remove the contents of “clientip”.

However, it does not. “clientip” is still intact. (clientip_v6 is filled, though)

Does anybody have an idea what I’m doing wrong here?

Thanks.

Kind regards
Dennis


(Jochen) #2

Please post the complete extractor configuration and some sample messages.


(Dennis Ploeger) #3

Yes, sure.

This is our complete extractor set for the input:

{
  "extractors": [
    {
      "title": "Puppet: Puppet run time",
      "extractor_type": "regex",
      "converters": [
        {
          "type": "numeric",
          "config": {}
        }
      ],
      "order": 7,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "puppetRunTime",
      "extractor_config": {
        "regex_value": "^.*puppet-agent\\[\\d[0-9]{0,9}.*\\]: Finished catalog run in ((\\d[0-9]{0,9}\\.[0-9]{0,9})) seconds"
      },
      "condition_type": "regex",
      "condition_value": "^.*(puppet-agent\\[\\d[0-9]{0,9}.*\\]: Finished catalog run in (\\d[0-9]{0,9}\\.[0-9]{0,9}) seconds)"
    },
    {
      "title": "Puppet: Puppet Configuration Version",
      "extractor_type": "regex",
      "converters": [
        {
          "type": "numeric",
          "config": {}
        }
      ],
      "order": 16,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "puppetConfigVersion",
      "extractor_config": {
        "regex_value": "^.*puppet-agent\\[\\d[0-9]{0,9}.*\\]: Applying configuration version '(\\d[0-9]{0,9})'"
      },
      "condition_type": "regex",
      "condition_value": "^.*(puppet-agent\\[\\d[0-9]{0,9}.*\\]: Applying configuration version '(\\d[0-9]{0,9})')"
    },
    {
      "title": "Apache Combined Log Extractor",
      "extractor_type": "grok",
      "converters": [],
      "order": 8,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{COMBINEDAPACHELOG}",
        "named_captures_only": true
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "Graylog server log format",
      "extractor_type": "grok",
      "converters": [],
      "order": 12,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{GRAYLOG_SERVER_LOG}",
        "named_captures_only": true
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "MongoDB Log",
      "extractor_type": "grok",
      "converters": [],
      "order": 10,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{MONGODB_LOG}",
        "named_captures_only": true
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "Copy IP to text",
      "extractor_type": "copy_input",
      "converters": [],
      "order": 22,
      "cursor_strategy": "copy",
      "source_field": "clientip",
      "target_field": "clientip_text",
      "extractor_config": {},
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "HAProxy TCP Extractor",
      "extractor_type": "grok",
      "converters": [],
      "order": 14,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{HAPROXYTCP}",
        "named_captures_only": false
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "Postfix SMTP",
      "extractor_type": "grok",
      "converters": [],
      "order": 2,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{POSTFIXSMTP}"
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "Postfix SMTPD",
      "extractor_type": "grok",
      "converters": [],
      "order": 15,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{POSTFIX_SMTPD}"
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "Postfix Queue Manager",
      "extractor_type": "grok",
      "converters": [],
      "order": 11,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{POSTFIX_QMGR}"
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "Postfix Statistics",
      "extractor_type": "grok",
      "converters": [],
      "order": 13,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{POSTFIX_ANVIL}"
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "IPTables",
      "extractor_type": "grok",
      "converters": [],
      "order": 17,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{IPTABLES}"
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "JIRA_PROJECT_ID",
      "extractor_type": "grok",
      "converters": [],
      "order": 18,
      "cursor_strategy": "copy",
      "source_field": "request",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": ".*selectedProjectId=(?<project_id>[^&]*)"
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "Artifactory Request Log",
      "extractor_type": "grok",
      "converters": [],
      "order": 3,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{DATESTAMP_EVENTLOG:timestamp}\\|%{NUMBER:size}\\|REQUEST\\|%{IPV4:client}\\|%{USER:user}\\|%{WORD:method}\\|(?<path>[^|]*)\\|HTTP/%{NUMBER:httpversion}\\|%{NUMBER:response}\\|.+",
        "named_captures_only": true
      },
      "condition_type": "string",
      "condition_value": "|"
    },
    {
      "title": "AEM Request Log response",
      "extractor_type": "grok",
      "converters": [],
      "order": 5,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{HTTPDATE} \\[%{NUMBER:sequence}\\] %{NOTSPACE:direction} %{NUMBER:response} %{NOTSPACE:content-type} %{NUMBER:duration}ms"
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "AEM Request Log request",
      "extractor_type": "grok",
      "converters": [],
      "order": 4,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{HTTPDATE} \\[%{NUMBER:sequence}\\] %{NOTSPACE:direction} %{WORD:verb} %{NOTSPACE:path} HTTP/%{NUMBER:httpversion}"
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "AEM Access Log",
      "extractor_type": "grok",
      "converters": [],
      "order": 0,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{IPORHOST:clientip} %{HTTPDUSER:ident} %{USER:auth} %{HTTPDATE:timestamp} \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent}"
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "AEM Error Log",
      "extractor_type": "grok",
      "converters": [],
      "order": 1,
      "cursor_strategy": "cut",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{DATE} %{HOUR}:%{MINUTE}:%{SECOND}.(?<second_fraction>[0-9][0-9][0-9]) \\*%{LOGLEVEL}\\* "
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "Fix POSTFIX_KEYVALUE length",
      "extractor_type": "substring",
      "converters": [],
      "order": 20,
      "cursor_strategy": "cut",
      "source_field": "POSTFIX_KEYVALUE",
      "target_field": "POSTFIX_KEYVALUE",
      "extractor_config": {
        "end_index": 32766,
        "begin_index": 0
      },
      "condition_type": "regex",
      "condition_value": "^.{32765,}$"
    },
    {
      "title": "Fix postfix_keyvalue_data length",
      "extractor_type": "substring",
      "converters": [],
      "order": 21,
      "cursor_strategy": "cut",
      "source_field": "postfix_keyvalue_data",
      "target_field": "postfix_keyvalue_data",
      "extractor_config": {
        "end_index": 32766,
        "begin_index": 0
      },
      "condition_type": "regex",
      "condition_value": "^.{32765,}$"
    },
    {
      "title": "Fix POSTFIX_SMTPD length",
      "extractor_type": "substring",
      "converters": [],
      "order": 19,
      "cursor_strategy": "cut",
      "source_field": "POSTFIX_SMTPD",
      "target_field": "POSTFIX_SMTPD",
      "extractor_config": {
        "end_index": 32766,
        "begin_index": 0
      },
      "condition_type": "regex",
      "condition_value": "^.{32765,}$"
    },
    {
      "title": "Christ Apache Logs",
      "extractor_type": "grok",
      "converters": [],
      "order": 9,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{CHRISTAPACHELOG}"
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "EP Apache Log extractor",
      "extractor_type": "grok",
      "converters": [],
      "order": 6,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{EPAPACHELOG}"
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "Remove IPv6",
      "extractor_type": "regex_replace",
      "converters": [],
      "order": 23,
      "cursor_strategy": "cut",
      "source_field": "clientip",
      "target_field": "clientip_v6",
      "extractor_config": {
        "regex": ".*",
        "replacement": "0.0.0.0",
        "replace_all": true
      },
      "condition_type": "string",
      "condition_value": ":"
    }
  ],
  "version": "2.2.0-SNAPSHOT"
}

I posted the complete export, because maybe it’s a side effect of another extractor. The extractor giving me the headache is “Remove IPv6”

This is a sample message:

0:0:0:0:0:0:0:1 - admin 07/Mar/2018:08:51:59 +0100 "GET /etc/replication/agents.publish/flush.2.json HTTP/1.1" 200 725 "-" "Ruby"

(Dennis Ploeger) #4

Any ideas? :no_mouth:


(Jochen) #5

Try using the regular expression (.*) in your “Remove IPv6” extractor instead of .*.


(Dennis Ploeger) #6

That, umm, works partly. No, the content of the clientip-field is “fullyCutByExtractor” :grin:
Can this be changed?


(Jochen) #7

Use “Copy”, not “Cut”. The contents of the field will be replaced by the extractor anyway.


(Dennis Ploeger) #8

Hm. With “copy”, clientip isn’t changed, but clientip_v6 is set to “0.0.0.00.0.0.0”…


(Jochen) #9

FWIW, I would use a pipeline rule for that instead of an extractor (which might not be easy to follow when lots of extractors run before and after).

Example

rule "copy-ipv6-address"
when
  has_field("clientip") && contains(to_string($message.clientip), ":")
then
  let clientip = to_string($message.clientip);
  set_field("clientip_v6", clientip);
  set_field("clientip", "0.0.0.0");
end

(Dennis Ploeger) #10

Ah! I managed to get it working by setting the “store as field” value to the same field name. Now it’s just overwriting the field, which is okay. I could extract the ipv6 with another extractor and copy it to another field, if I wanted.

Thanks for the support!


(system) #11

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.